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COMPOSITE BINDING POLYPEPTIDES 

TECHNICAL FIELD 

5 The present disclosure is in the fields of molecular biology and protein design; in 
particular, the design of sequence-specific binding proteins for regulation of gene 
expression. 

10 BACKGROUND 

Protein-nucleic acid recognition is a commonplace phenomenon that is central to a large 
number of biomolecular control mechanisms that regulate the functioning of eukaryotic 
and prokaryotic cells. For instance, protein-DNA interactions form the basis of the 
1 5 regulation of gene expression and are thus one of the subjects most widely studied by 
molecular biologists. 

A wealth of biochemical and structural information explains the details of protein-DNA 
recognition in numerous instances, to the extent that general principles of recognition 
20 have emerged. Many DNA-binding proteins contain independently folded domains for 
the recognition of DNA, and these domains in turn belong to a large number of stmctural 
families, such as the leucine zipper, the "helix-tum-heUx" and zinc finger families. 

Despite the great variety of stmctural domains, the specificity of the interactions observed 
25 to date between protein and DNA most often derives firom the complementarity of the 

surfaces of a protein a-helix and the major groove of DNA. See, e.g., Klug, (1993) Gene 
135:83-92. In Ught of the recurring physical interaction of a-helix and major groove, the 
tantalising possibility arises that the contacts between particular amino acids and DNA 
bases could be described by a simple set of rales; in effect a stereochemical recognition 
30 code which relates protein primary structure to binding-site sequence preference. 
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It is clear, however, that no code will be found which can describe DNA recognition by 
all DNA-binding proteins. The structmes of numerous complexes show significant 
differences in the way that the recognition a-helices of DNA-binding proteins from 
different structural families interact with the major groove of DNA, thus precluding 
5 similarities in patterns of recognition. The majority of known DNA-binding motifs are 
not particularly versatile, and any codes which might emerge would likely describe 
binding to a very few related DNA sequences. 



Even within each family of DNA-binding proteins, moreover, it has hitherto appeared 
10 that the deciphering of a code would be elusive. Due to the complexity of the protein- 
DNA interaction, there does not appear to be a simple "alphabetic" equivalence between 
the primary structures of protein and nucleic acid which specifies a direct amino acid to 
base relationship. 

15 International patent application WO 96/06166 addresses this issue and provides a 
"syllabic" code that explains protein-DNA interactions for zinc finger nucleic acid 
binding proteins. A syllabic code is a code that relies on more than one feature of the 
binding protein to specify binding to a particular base, the features being combinable in 
the forms of "syllables", or complex instructions, to define each specific contact. Segal, 

20 D. J., Dreier, B., Beerli, R. R. & Barbas, C. F. (1999) Proc. Natl. Acad. Sci. USA 96, 
2758-2763 present a method of constructing zinc fingers polypeptides, based on 16 
individual zinc finger domains which bind sequences of the form 5'-GXX-3', where X is 
any base. See also U.S. Patent No, 6,140,08 1. The latter method has the severe 
hmitation that it does not provide instructions permitting the specific targeting of triplets 

25 containing nucleotides other than G in the 5' position of each triplet, which greatly 
restricts the potential target sequences of such generated zinc finger peptides. 

International patent application WO98/53057 addresses the above problems by 
recognizing that zinc fingers can specify overlapping 4 bp subsites, and therefore synergy 
30 between adjacent zinc finger domatas is an important consideration in selecting zinc 
finger nucleic acid-binding domains to specifically target any sequence. 
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With the recent completion of the human genome project and the rapidly advancing fields 
of transgenic animals and plants, thousands of uncharacterised (and characterised) genes 
have (and will) become valid targets for functional genomics and other such projects. 
Concomitantly, * designer' zinc finger peptides are emerging as one of the most universal 
5 and desirable ways of regulating the expression of specific genes within cells. See, for 
example, Choo, Y., Sanchez-Garcia, I. & Klug, A. (1994) Nature 372: 642-645; Beerli, 
R. R., Dreier, B. & Barbas, C. F. m (2000) Proc, Natl Acad. Sci. USA 97: 1495-1500; 
Kim, J-S. & Pabo, C. O. (1998) Proc. Natl. Acad, ScL USA 95: 2812-2817; Kang, J. S. & 
Kim, J-S. (2000) J. Biol. Chem. 275: 8742-8748); Zhang et al. (2000) J. Biol Chem. 
10 275:33,850-33,860; "Liuetal (2001) J. Biol Chem. 276:11,323-11,334; and Ren a/. 
(2002) Genes. Deve/. 16:27-32. See also WO 00/41566 and WO 01/19981. Hence, a 
rapid method of creating multi-zinc finger peptides for the up- or down-regulation of any 
specific gene is highly desirable. 

As stated above, synergy betwera. adjacent zinc finger peptides is an important factor in 
15 specific DNA recognition. Moreover, the findings reported in co-owned WO 01/53480, 
which is hereby incorporated by reference, demonstrate that poly-zinc finger peptides 
constructed from strings of 2-finger domains can provide greater DNA binding 
specificity. 

20 Traditional strategies of zinc finger mutagenesis and selection, such as phage display, 
particularly if employed for the selection of 2-zinc finger units to target any desired 
binding site are limited by the size of the library that can be cloned into host/vector 
systems, such as phage. Due to limitations in library size imposed by such constraints, it 
is impossible to include an exhaustive combination of randomisations to cover all 

25 potentially important sequence-space. Furthermore, for important applications of 

engineered zinc finger peptides, such as for gene therapy or transgenic animal systems, 
engineered zinc finger peptides run the significant risk of eliciting a haraifiil 
inunimological reaction in the host animal. 

30 The human genome sequencing project has also revealed the presence of ahnost 700 
endogenous zinc finger-containing proteins. Assuming that each of these proteins 
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contains at least 2 finger modules, there are probably at least 2,000 natural zinc finger 
modules in the human genome alone. Similar numbers are expected in other animal and 
plant genomes. 

SUMMARY 

The present invention recognises the potential importance of designer zinc finger peptides 
in therapeutic and transgenic applications in animals and plants. Furthermore the present 
invention acknowledges that the safety of such applications is of primary importance. 

The present invention provides the isolation of natural zinc finger modules, from 
genomes such as human, mouse, chicken, arabidopsis and other species, and the 
• constmction of non-natural combiaations of such zinc finger modules, to create multi- 
finger domains, and to provide and determine novel nucleic acid binding specificities. 
Such a procedure will allow the identification of the novel zinc finger domains that bind 
any desired nucleic acid sequence, particularly sequences of between 6 and 10 
nucleotides long. The first advantage of such technology is that millions of years of 
natural evolution, to create specific nucleotide-binding zinc finger modules, are captured 
to create novel nucleic acid-binding domains. Also, use of poly-zinc finger peptides 
constructed fi-om such units for targeted gene regulation avoids the potentially harmfixl 
effects of host immune responses. The present invention thus greatly enhances the 
possibilities for the use of zinc finger transcription factors for in vivo applications, such as 
gene therapy and transgenic animals. 

In a first aspect, therefore, there is provided a composite binding polypeptide comprising 
a first natural binding domain derived firom first natural binding polypeptide, and a 
second natural binding domain derived from a second natural binding polypeptide, 
wherein said first and second natural binding polypeptides may be the same or different; 
which polypeptide binds to a target, said target differing from the natural target of tiie 
both the first and the second binding polypeptides. 

Preferably, said first and second natural binding polypeptides are different polypeptides. 



wo 02/099084 



PCT/US02/22272 



5 

Binding polypeptides according to the invention comprise two or more natural binding 
domains, advantageously three or more natural binding domains; advantageously, six or 
more domains are included. These are preferably arranged in a 3x2 conformation, 
5 separated by linker sequences. 

The binding domains are preferably nucleic acid binding domains, and the composite 
polypeptide is preferably a nucleic acid binding polypeptide. Most preferably, the 
composite polypeptide is a zinc finger polypeptide, and the natural binding domains are 
1 0 zinc finger domains. 

Zinc finger binding domains can comprise any type of zinc finger or zmc-coordinated 
stracture including, but not limited to, Cys2-His2 (SBQ ID NO:l) zinc finger binding 
domain or Cys3-His (SEQ ID NO:2) zinc finger binding domains. 

15 

In a fiirther aspect, there is provided a library of natural binding domains. The natural 
binding domains are the domains that may be assembled into polypeptides according to 
the previous aspect of the invention. Preferably, the library is of natural zinc finger 
nucleic acid binding domains, 

20 

Said zinc finger domains may comprise a linker attached thereto. Any linker amino acid 
sequence known in the art can be used. Advantageously, the linker comprises the amino 
acid sequence TGEKP (SEQ ID NO:3). 

25 In a fiirther aspect, the invention provides a method for selecting a binding polypeptide 
capable of binding to a target site, comprising: 

(a) providing a library of natural binding domains; 

(b) assembling two or more of said domains to form a composite polypeptide; 

(c) screening said composite polypeptide against the target site in order to 
30 determine its ability to bind the target site. 

Preferably, the natural binding domains are zinc finger binding domains. 
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Furtheimore, the invention provides methods for designing a composite binding 
polypeptide, comprising: 

(a) providing information defining a target site; 
5 (b) selecting, from a database of natural binding domains, a sequence of binding 

domains, separated by linker sequences, which is predicted to bind to the target site; 

(c) displaying the sequence of binding domains and linkers and optionally 
assembling the binding polypeptide from a library of said domains. 

10 In certain embodiments, the binding domains are zinc finger domains. In certain 
embodiments, a binding domain sequence that will bind a particular target site is 
predicted by the application of one or more mles that define target binding interactions 
for the binding domains. In additional embodiments, a nucleotide sequence encoding the 
binding domains is assembled and introduced into a cell such that the composite binding 

15 polypeptide is expressed. 

In one embodiment, zinc fingers can be considered to bind to a nucleic acid triplet, in 
which case domains can be selected according to one or more of the following rules: 

(a) if the 5' base in the triplet is G, then position +6 in the a-helix is Arg; or 
20 position +6 is Ser or Thr and position ++2 is Asp; 

(b) if the 5' base in the triplet is A, then position +6 in the a-helix is Gin and -H-2 
is not Asp; 

(c) if the 5' base in the triplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

25 (d) if the 5' base in the triplet is C, then position +6 in the a-helix may be any 

amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if the central base in the triplet is G, then position +3 m the a-helix is His; 

(f) if the central base in the triplet is A, then position +3 in the a-helix is Asn; 

(g) if the central base in the triplet is T, then position +3 in the a-helix is Ala, Ser 
30 or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 
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(h) if the central base in the triplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if the 3' base in the triplet is G, then position -1 in the a-helix is Arg; 
(j) if the 3' base in the triplet is A, then position -1 in the a-helix is Gin; 

(k) if the 3' base in the triplet is T, then position -1 m the a-helix is Asn or Gin; 
(1) if the 3' base in the triplet is C, then position -1 in the a-hehx is Asp. 

In a further embodiment, the zinc fingers can be considered to bind to a nucleic acid 
quadruplet and domains can be selected according to one or more of tlie foUowmg rules: 

(a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg or Lys; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Glu, Asn or 

Val; 

(c) if base 4 in tlie quadruplet is T, then position +6 in the a-helix is Ser, Thr, Val 

or Lys; 

(d) if base 4 in the quadruplet is C, then position +6 in the a-helix is Ser, Thr, Val, 
Ala, Glu or Asn; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadmplet is A, then position +3 in the a-heUx is Asn; 

(g) if base 3 in the quadruplet is T, then position 4-3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thi- or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Ghi; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is His or Thr; 

(1) if base 2 in the quadruplet is C, then position -1 in the a-heUx is Asp or His; 

(m) if base 1 in tlie quadruplet is G, then position +2 is Glu; 

(n) if base 1 in the quadruplet is A, then position +2 Arg or Gin; 

(o) if base 1 in the quadruplet is C, then position +2 is Asn, Gin, Arg, His or Lys; 

(p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 
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In a preferred embodiment, zinc fingers are considered to bind to a nucleic acid 
quadruplet and domains are selected according to one or more of the following rules: 

(a) if base 4 in the quadruplet is G, then position +6 in Ihe a-helix is Arg; or 
position +6 is Ser or Thr and position -H-2 is Asp; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Ghi and 4-f2 
is not Asp; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr 
and position -H-2 is Asp; 

(d) if base 4 in the quadruplet is C, then position +6 in the a-helix may be any 
amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if base 3 in the quadruplet is G, then position +3 m the a-helix is His; 

(f) if base 3 in the quadmplet is A, then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 m the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 m the quadmplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in the quadmplet is A, then position -1 in the a-helix is Gin; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is Asn or Gin; 
(1) if base 2 in the quadmplet is C, then position -1 in the a-helix is Asp; 
(m) if base 1 in the quadruplet is G, then position +2 is Asp; 
(n) if base 1 in the quadruplet is A, then position +2 is not Asp; 
(o) if base 1 in the quadmplet is C, then position +2 is not Asp; 
(p) if base 1 in the quadmplet is T, then position +2 is Ser or Thr. 

Two or more composite polypeptides comprising two or more domains which are 
selected for binding to two or more target sites can be combined to provide a composite 
polypeptide which binds to an aggregate binding site comprising the two or more target 
binding sites. 
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In a still further aspect, the invention provides a computer-implemented method for 
designing a zinc finger polypeptide that binds to a target nucleic acid sequence, 
comprising the steps of: 

(a) providing a system comprising at least storage means for storing data relating 
5 to a library of zinc fingers; storage means for storing a rule table; means for inputting 

target nucleic acid sequence data; processing means for generating a result; and means for 
outputting the result; 

(b) inputing sequence data for a target nucleic acid molecule; 

(c) defining a first target zinc finger binding site in said nucleic acid molecule; 
10 (d) interrogating the zinc finger library and rule table storage means, comparing 

zinc fingers to tlie target zinc finger binding site according to the rule table and selecting 
zinc finger data identifying a zinc finger capable of binding to said target site; 

(e) defining at least one further target zinc finger binding site and repeating step 
(d); and 

15 (f) outputting the selected zinc finger data. 

Such a method may further comprise sending instructions to an automated chemical 
synthesis system to assemble a zinc finger polypeptide as defined by the zinc finger data 
obtained in (Q. 

20 

In additional embodiments, the sequence of one or more oligonucleotides encoding a 
composite binding polypeptide can be determined from the sequence of a composite 
binding polypeptide, and the one or more oligonucleotides can be synthesized by any 
number of well-known methods. 

25 

Preferably, a composite binding polypeptide is tested for binding to a target sequence, and 
data firom said testing is used to select, from a plurality of possibilities, a composite 
binding polypeptide that binds with optimal affinity and specificity to the target site. 

30 Advantageously, two or more zinc finger polypeptides are combined to form a zmc finger 
polypeptide capable of binding to an aggregate binding site comprising two or more 
target sites. 



wo 02/099084 



PCT/US02/22272 



10 

The rule table preferably comprises rules as set forth above. 
BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 shows a flowchart depicting part of the logic used in the selection of zinc 
fingers fi-om a natural library in accordance with the invention. The logic set forth in 
Figure 1 may be supplemented, for example using Rules relating to zinc fmger overlap. 
Functional testing of zinc fmgers for binding to the desired binding site may be 
implemented in an automated fashion and integrated with the zinc finger design system. 

Figure 2 is a schematic representation of the human zinc finger mini-library construction 
procedure. Synthetic zinc finger codmg oligonucleotides are assembled into fiilHength 
ds expression constructs by overlap PGR. 

Figure 3 is a schematic representation of the fluorescent ELISA assay used to detect zinc 
finger peptides bound to double stranded DNA target sites. Streptavidin (7), biotmylated 
DNA target (5) linked to biotin (6), 3-finger peptide (4) fiised to HA-tag (3), anti-HA 
antibody (2) fused to horseradish peroxidase (HRP, 1). 

Figure 4 depicts ELISA scores of 384 library 2 constructs screened against the 5'-GCG- 
TGG-GCG-3' (SEQ ID NO:4) target site. Six constructs showed significant binding, and 
are termed CS, G16, 119, 123, J19 and K19, according to their coordinates on the 384-well 
plate. 

Figure 5 depicts ELISA scores of selected Ubrary 2 members; BIO, C8, G16, 123, J19, 
and K19, against different DNA target sites. The sequences of the target sites are (firom 
back of graph to front): 5'.GCG-.TGG-GCG-3* (SEQ ID NO:5) ; 5'.CCA-CTC-GGC-3' 
(SEQIDNO:6); 5'-CCT-AGG-GGG-3'(SEQIDNO:7); 5'-GGA-TAA-GCG-3' (SEQ 
ID NO:8); S'-GGG-AGG-CCT-S' (SEQ ID NO:9); 5'.GCG-TAA-GGA-3' (SEQ ID 
NO: 10); 5'-GCG-GGG-GGA-3' (SEQ ID NO:l 1); and no DNA control (front row). 
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Figure 6 depicts a schematic representation of the 3-zinc finger library constructed 
according to the procedure described in Example 2. 

DETAILED DESCRIPTION 

5 

. Definitions 

Unless defined otherwise, all technical and scientific terms used herein have the same 
meaning as commonly understood by one of ordinary skill in the art (e.g., in cell culture, 

10 molecular genetics, nucleic acid chemistry, hybridisation techniques and biochemistry). 
The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of chemistry, molecular biology, microbiology, recombinant 
DNA, immunology, chemical methods, pharmaceutical formulations and delivery and 
treatment of patients, which are within the capabilities of a person of ordinary sldll in the 

15 art. Such tecliniques are explained in the hterature. See, for example, J. Sambrook, E. F. 
Fritsch, and T. Maiiiatis, 1989, Molecular Cloning: A Laboratoiy Manual^ Second 
Edition, Books 1-3, Cold Spring Harbor Laboratory Press; Ausubel, F. M. et al. (1995 
and periodic supplements; Current Protocols in Molecular Biology, ch. 9, 13, and 16, 
John Wiley & Sons, New York, N.Y.); B. Roe, J. Crabtree, and A. Kahn, 1996, DNA 

20 Isolation and Sequencing: Essential Techniques^ John Wiley & Sons; J. M. Polak and 
James O'D. McGee, 1990, In Situ Hybridisation: Principles and Practice', Oxford 
University Press; M. J. Gait (Editor), 1984, Oligonucleotide Synthesis: A Practical 
Approach, IRL Press; and, D. M. J. Lilley and J. E. Dahlberg, 1992, Methods of 
Enzymology: DNA Stf-ucture Part A: Synthesis and Physical Analysis of DNA Methods in 

25 Enzymology, Academic Press. Each of these general texts is herein incorporated by 
reference. 

The temi "library" is used according to its common usage in the art, to denote a collection 
of different polypeptides or, preferably, a collection of nucleic acids encoding different 
30 polypeptides. The Ubraries of natural zinc finger peptides referred to herein comprise or 
encode a repertoire of polypeptides of different sequences, each of which has a preferred 
binding sequence. 
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The terms "polypeptide", "peptide" and "protein" are used interchangeably to refer to a 
polymer of amino acid residues, preferably including naturally occurring amino acid 
residues. Artificial amino acid residues are also within the scope of the invention, but the 
exclusive use of naturally-occurring amino acids is preferred in order to maintain the 
natural nature of the binding domains. There are 20 common amino acids, each specified 
by a different arrangement of three adjacent DNA nucleotides by the genetic code. These 
are the building blocks of proteins. Joined together in a strictly ordered chain by peptide 
bonds, the sequence of amino acids determines each polypeptide molecule. The 20 
common amino acids are: alanine, arginine, aspartic acid, glutamic acid, glutamine, 
glycine, histidine, isoleucine, leucine, phenylalanine, proline, serine, threonine, 
tryptophan, tyrosine, valine, cysteine, methionine, lysine, and asparagine. Virtually all of 
these amino acids (except glycine) possess an asymmetric carbon atom, and thus are 
potentially chiral in nature. 

As used herein, "nucleic acid" includes both RNA and DNA, and nucleic acids 
constracted from natural nucleic acid bases or synthetic bases, or mixtures thereof. 
Modified nucleic acids such as, for example, PNAs and morpholino nucleic acids, are 
also included in this definition. 

A "gene", as used herein, is the segment of nucleic acid (typically DNA) that is involved 
in producing a polypeptide chain or ribonucleic acid gene product. It includes regions 
preceding and following the coding region (leader and trailer) as well as intervening 
sequences (introns) between individual coding segments (exons). Preferably, "gene" 
includes the necessary control sequences for gene expression, as well as the coding region 
encoding the gene product. 

A '1)inding polypeptide" is a polypeptide capable of binding to a specific target. 
Although, as is well known, polypeptides are enable of non-specific binding to a wide 
range of substrates, it is also known that certain polypeptides, such as antibodies and 
other members of the immxmoglobulin superfamily, zinc fingers, leucine zipper 
polypeptides, peptide aptamars and the like can bind specifically to target sites or 
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molecules. Generally, specific binding is preferably achieved with a dissociation constant 
(Kd) of 100}a.M or lower; preferably lOfxM or better; preferably 1|liM or better; and ideally 
0.5|LiM or better. Binding polypeptides can be nucleic acid binding polypeptides which 
bind to nucleic acid in a target sequence-specific manner, such as zinc finger 
5 polypeptides. Unless specifically noted, no difference is intended herein between terms 
such as "peptide", "polypeptide" and "protein". 

A "natural binding polypeptide" is a bindmg polypeptide encoded by the genome of a 
living organism such as, for example, a plant or animal. 

10 

A "composite" polypeptide is a polypeptide that is assembled from a plurality of 
components. Li a preferred embodiment, the invention provides composite binding 
polypeptides that are assembled from a plurality of individual natural binding domains as 
set forth in detail herein. Typically, such domains are zinc finger nucleic acid binding 
15 domains. 

A "natural binding domain" (or module) is a domain of a naturally occurring polypeptide 
that is enable of specific binding to a target as defined above. The terms "domain" and 
"module", accorduig to their ordinary signification in the art, refer to a discrete 
20 continuous part of the amino acid sequence of a polypeptide that can be equated with a 
particular ftmction. Protein domains or modules are largely structurally independent and 
can retain their structure and fimction in different environments. In certain embodiments, 
a natural binding domain or module is a zinc finger that binds a triplet or quadmplet 
nucleotide sequence. 

25 

Preferably, each of the individual natural binding domains that make up a composite 
binding polypeptide contain no changes in sequence, as compared to the natural 
sequence. However, those skilled in the art will understand that certain changes including 
conservative amino acid substitutions, as well as additions or deletions, may be made 
30 without altering the fimction of a domain. Moreover, where the changes are consistent 
with sequences common to the species from which the domain is derived, such as for 
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example being present in consensus sequences, they are unlikely to give rise to 
immunological problems. 

Conservative amino acid substitutions may be made, for example according to Table 1. 
5 Amino acids in the same block in the second colunm and preferably in the same line in 
the third column may be substituted for one another: 
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Table 1 



ALIPHATIC 


Non-polar 


GAP 






XL V 




Polar - uncharged 


o 1 JVL 






NQ 




Polar - charged 


DE 






KR 


AROMATIC 




HF WY 



A domain is "derived" from a protein if it is effectively removed from a naturally- 
occurring protein for use in a composite binding polypeptide. Removal may be physical 
5 removal, by cleavage of the protein; more commonly, however, the sequence of the 

domain is determined and the domain is synthesised by protein synthesis techniques to be 
a copy of the naturally-occurriag domain. Altematively, a nucleic acid encoding the 
domain is sjnithesized and expressed in a cell. In vitro synthesised domains, or in vitf^o 
synthesized polynucleotides encoding naturally-occurring domains, are considered to be 
10 "derived" from the natural protein if they recapitulate the sequence of the naturally- 
occurring domain. 

A **target" is a molecule or part thereof to which a binding polypeptide or a binding 
doamin is capable of specific binding. The "natural target" of a binding polypeptide is 
15 tlie target to which that polypeptide binds in nature; e.g,^ in a living cell. In the case of 
zinc finger polypeptides, for instance, the natural target is the nucleotide sequence to 
which the polypeptide binds in a living cell. Sequences other than the natural target, as 
defined herein, to which a zinc finger polypeptide may bind in vitro are not natural 
targets. 

20 

In the case of nucleic acid binding polypeptides, therefore, the term "target" may be 
substituted or supplemented with *'binding site" or ^^binding sequence." Where binding 
sites are assembled to form larger binding sites, which are bound by multi-domain 
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binding polypeptides, such binding sites are referred to as "aggregate binding sites", 
indicating that they are fonned by the juxtaposition of two or more individual binding 
sites. The aggregate binding sites can comprise contiguous individual binding sites, or 
individual binding sites interspersed by one or more intervening nucleotides or sequence 
5 of nucleotides. 

The present invention relates to naturally-occurring zinc fingers and their use as specific 
nucleic acid binding modules in combinations not present in nature. This invention 
provides methods of determining and/or predicting the nucleotide binding specificities of 

10 natural zinc finger modules. Also provided are methods of constructing poly-zinc finger 
peptides containing at least one natviral zinc finger module, from Ubraries of natural zinc 
finger peptides, and methods of screening such peptides to determine their preferred 
nucleotide binding specificity. Moreover, the invention provides for the use of 
combinations of such natural zinc finger modules in poly-zinc finger peptides not present 

15 in nature, to bind any desired nucleotide sequence. 

Poly-zinc finger peptides of this invention may contain 2, 3, 4, 5, 6 or more zinc finger 
modules. Natural zinc fiiiger modules of this invention may preferably be linked by 
canonical, flexible or stmctured linkers, as set out below and in WO 01/53480, the 
disclosure of which is hereby incorporated by reference. More preferably, the linkers are 
20 canonical linkers such as -TGEKP- (SEQ ID NO:3). 

The poly-zinc finger peptides of this invention can be given useful biological functions by 
the addition of effector domains, creating cliimeric zinc finger peptides. Preferably, such 
chimeric zmc finger peptides may be used to up- or down-regulate desired genes, in viti'o 

25 or in vivo. Preferable effector domains include transcriptional repressor domains, 
transcriptional activator domains, transcriptional insulator domains, chromatin 
remodelling domains, enzymatic domains, and signalling / targeting sequences or 
domains. To cause a desired biological effect composite binding polypeptides can bind to 
one or more suitable nucleotide sequences in vivo or in vitro. Preferred DNA regions 

30 fi-om which to effect tlie up- or down-regulation of specific genes include promoters, 
enhancers or locus control regions (LCRs). Other suitable regions within genomes. 
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which may provide useful targets for composite binding polypeptides include telomeres 
and centromeres. 

The expression of many genes is also achieved by controlling the fate of the associated 
5 RNA transcript. RNA molecules often contain sites for RNA-binding proteins, which 
determine RNA half-life. Hence, composite binding polypeptides can also control 
endogenous gene expression by specifically targeting RNA transcripts to either increase 
or decrease their half-life within a cell. 

10 Composite binding polypeptides can also be fused to epitope tags, which can be detected 
by antibodies, and may therefore be used to signal the presence or location of a particular 
nucleotide sequence in a mixed pool of nucleic acids, or iromobilised on the surface of a 
chip or other such surface. 

15 Intracellular localization of composite binding polypeptides can be regulated, for 

example, by fusion to a localization domain, for example, a nuclear localization sequence 
or a localization domain as disclosed, for example, in PCT/USO 1/42377. 

a. Nucleic Acid Binding Polypeptides 

20 

This invention preferably relates to nucleic acid binding polypeptides. Preferably, the 
binding polypeptides of the invention are DNA binding polypeptides. Particularly 
prefen ed examples of nucleic acid biading polypeptides are zinc finger peptides. 

25 Zinc finger peptides typically contain strings of small nucleic acid binding domains, each 
stabilised by the co-ordination of zinc. These individual domains are also referred to as 
"fingers" and "modules". A zinc finger recognises and binds to a nucleic acid triplet, or 
an overlapping quadmplet, in a DNA target sequence. However, zinc fingers are also 
known to bind RNA and proteins. Clemens, K. R. et al, (1993) Science 260: 530-533; 

30 Bogenhagen, D.F. (1993) Mol Cell Biol 13: 5149-5158; Searles, M. A. et aL, J, Mol 
Biol 301: 47-60 (2000); Mackay, J. P. & Crossley, M, (1998) Trends Biochem. Sci. 23: 
1-4. 
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Preferably, there are 2 or more zinc fingers, for example 2, 3, 4, 5, 6, or 7 zinc fingers, in 
each zinc finger polypeptide. Advantageously, there are 3 or more zinc fingers in each 
zinc finger polypeptide. 

5 

All of tlie DNA binding residue positions of zinc finger peptides, as referred to herein, are 
numbered fi-om the first residue in the a-helix of the finger, ranging firom +1 to +9. "-1" 
refers to the residue in the fi-amework structure immediately preceding the a-heUx in a 
zinc finger peptide. Residues referred to as are residues present in an adjacent 
10 (C-terminal) peptide. Where there is no C-terminal adjacent peptide, interactions do 
not operate. 

The a-helix of a zinc finger peptide aligns antiparallel to the target nucleic acid strand, 
such that the primary nucleic acid sequence is arranged 3' to 5' in order to correspond 

15 with the N- terminal to C-terminal sequence of the zinc finger peptide. Since nucleic acid 
sequences are conventionally written 5' to 3', and amino acid sequences N-terminus to 
C-terminus, the result is that when a target nucleic acid sequence and a zinc finger 
peptide are aligned according to convention, the primary interaction of the zinc finger 
peptide is Avith the "minus" strand of the nucleic acid sequence, since it is this strand 

20 which is aligned 3* to 5'. These conventions are followed in the nomenclature used 

herein. It should be noted, however, that in nature certain zinc finger modules, such as 
zinc finger 4 of the protein GLI, bind to the "plus" strand of the nucleic acid sequence. 
See Suzuld et al (1994) Nucl Acids Rev, 22: 3397-3405; and Pavletich & Pabo, (1993) 
Science 261: 1701-1 707 . The present invention encompasses incorporation of such zinc 

25 finger peptides into DNA binding molecules. 

Natural Zinc Finger Peptides. 

In certain embodiments, this invention relates to natural zinc finger modules. As used 
30 herein, the term 'natural' with reference to a zinc finger, means that the DNA sequence 
which encodes a particular zinc finger, whether normally expressed in vivo or not, is 
foimd in nature, Le. is part of the genome of a cell. A natural human zinc finger is one 
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which is endogenous to the human genome, a natmal mouse zinc finger is foimd in the 
mouse genome, and a natural viral zinc finger is found in a viral genome, etc. Natural 
zinc finger genes which have become integrated into the genome of a heterologous 
species by natural means, e.g.^ integration of a viral genome into a host genome, are 
5 considered to be endogenous to the host species within the context of this disclosure. A 
zinc finger module constructed or produced in viU'o or extracted firom an in vivo soirrce is 
considered to be natural if its amino acid sequence matches that of tiie amino acid 
sequence encoded by its natural gene. The DNA sequence of the natural gene is not the 
defining aspect. Thus, polynucleotides encoding natural zinc finger modules may have a 
10 different sequence firom that of the naturally-occurring sequence encoding the module, 

e.g., to adjust codon usage to optimise expression of the module in a particular expression 
system. 



Preferably, sequences of zinc fingers used in the present invention are not mutated from 
15 their natural form. Advantageously, the natural zinc finger polypeptides are expressed in 
nature. 



A natural zinc finger binding motif is a structure well known to those in the art and 
defined in, for example. Miller et al, (1985) EMBO J, 4: 1609-1614; Berg (1988) Proc. 
20 Natl Acad. Set USA 85: 99-102; Lee et al, (1989) Science 245: 635-637; see also 

International patent applications WO 96/06166 and WO 96/32475, incorporated herein by 
reference. 

In general, a natural zinc finger framework has the structure: 
25 SEQ ID NO: 12 X0-2 C Xi^s C X3.i4 H X3.6 Vc 

where X is any amino acid, and the numbers in subscript indicate tlie possible numbers of 
residues represented by X (Formula A). 

In a preferred aspect of the present invention, natural zinc finger nucleic acid binding 
30 motifs may be represented as motifs having the following primary structm*e: 

Xo-2 C Xi-5 C X2-7 XXXXXXXH X3-6 ^/c (SEQ ID NO:14) 

(SEQ ID NO: 13) 
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"11234567 

where X is any amino acid, and the numbers ia subscript indicate the possible numbers of 
residues represented by X (Formula A')- The mmibers -1 through 7 refer to amino acid 
position with respect to the beginning of the alpha-heUcal region of the zinc finger. 

The Cys and His residues, which together co-ordinate the zinc metal atom, are marked in 
bold text and are usually invariant However, all naturally-occurring zinc finger modules, 
even if they diverge from the above formula, are encompassed within the scope of this 
invention. 

Zinc finger modules of formula A' are often arranged ra tandem within a natural zinc 
finger polypeptide, such that a zinc finger containing protein may have 2, 3, 4, 5, 6, 7, 8, 
9 or more iadividual zinc finger motifs. In such a protein, individual zinc fingers are 
joined to each other by a polypeptide sequence known as a linker. Generally, such a 
natural linker lacks secondary structure, although the amino acids within the linker may 
form local interactions when the protein is bound to its target site. By 'linker sequence' is 
meant an amino acid sequence that links together adjacent zinc finger modules. For 
example, in a natural zinc finger protein, the linker sequence is the amino acid sequence 
which lies between the last residue of the a-helix in a zinc finger and the first residue of 
the sheet in the next zinc finger. The linker sequence therefore joins together two zinc 
fingers. For the purposes of the present invention, the last amino acid of the a-helix in a 
zinc finger is considered to be the final zinc coordinating histidine (or cysteine) residue, 
while the first amino acid of the following finger is generally a tyrosine / phenylalanine or 
another hydrophobic residue. Since some natural zinc fingers do not start with a 
hydrophobic residue (see Appendices), the start of a finger is sometimes harder to define 
fi-om amino acid sequence (or indeed zinc finger structure), and so some flexibility must 
be allowed in this definition. Accordingly, in a natural zinc finger protein, threonine is 
often considered to be the first residue in the linker, and proline is the last residue of the 
linker. Thus, for example, in the natural Zif268 peptide the Unker sequence is - 
TG(E/Q)(K/R)P- (SEQ ID NO: 15). Although natural linkers can vary greatly in terms of 
amino acid sequence and length, on the basis of sequence homology, the canonical 
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natural linker sequence is considered to be -TGEICP- (SEQ ID NO: 3). Hence, the 
preferred linker sequence to join zinc finger modules of the present invention is 
-TGEKP-. 



5 Additionally, a * leader' peptide may be added to the N-terminal zinc finger of a poly-zinc 
finger peptide to aid its expression, without changing the sequence of the natural zinc 
fibager module. Preferably, the leader peptide is MAEERP (SEQ ID NO:16) or MAERP 
(SEQ ID NO: 17). 



10 In general, naturally occurring zinc finger modules may be selected fi'om those proteins 
for which the DNA binding specificity is already known. For example, these may be the 
proteins for which a crystal structure has been resolved: namely Zif268 (Ehod-Erickson 
et al (1996) Structure 4: 1 171-1 180), GLI (Pavletich & Pabo (1993) Science 261 : 
1701-1707), Tramtrack (Fairall et al (1993) Nature 366: 483-487) and YYl (Houbaviy et 

15 al (1996) Proc. Natl Acad. Sou USA 93: 13577-13582). Furthermore, the sequence 
specificity of many naturally-occurring zinc fingers and zinc finger proteins are known. 
In addition, this invention further provides for the determination of the binding specificity 
of natural zinc finger modules for use in the present invention. See *Trediction of 
Binding Specificity," infi'-a, 

20 

Poly-Zinc Finger Peptides. 



It is desirable that a 'designer' transcription factor for uses such as gene therapy 
and in transgenic organisms should have the abiUty to target virtually unique sites within 
any genome. For complex genomes such as in humans, an address of at least 16 bps is 

25 required to specify a potentially unique DNA sequence. Shorter DNA sequences have a 
significant probability of appearing several times in a genome, raising the possibility of 
obtaining undesirable non-specific gene targeting with a designed transcription factor 
targeted to such a shorter sequence. As individual zinc fingers only bind 3 to 4 
nucleotides, it is therefore necessary to construct multi-finger polypeptides to target these 

30 longer sequences. A six-zinc finger peptide (with an 18 bp recognition sequence) could, 
in theory, be used for the specific recognition of a single target site and hence, the 
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Specific regulation of a single gene within any genome. In addition, a significant increase 
in binding affinity might also be expected, compared to a protein with fewer fingers. In 
simple terms, if a three-finger peptide (with a 9 bp recognition sequence) binds DNA with 
nanomolar affinity, two tandemly linked three-finger peptides might be expected to bind 
an 18 bp sequence with an affinity of 10"^^-10'^^ M. However, most previous attempts at 
producing high-affinity 6-finger peptides (poly-zinc finger peptides) based on fusions of 
two 3-finger domains have been unsuccessful in generating much of an improvement in 
affinity over 3-fmger peptides. Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas, C. F. m 
(1997) Proc. Natl Acad. ScL USA 94: 5525-5530; Kim, J-S. & Pabo, C. O. (1998) Proc. 
Natl Acad. ScL USA 95: 2812-2817; Kamiuchi, T., Abe, K, Imanishi, M., Kaji, T., 
Nagaoka, M. & Sugiura, Y. (1998) Biochemistiy 37: 13827-13834. To optimise both the 
affinity and specificity of 6-finger peptides, a fusion of three 2-fmger domains has been 
shown to be advantageous. Moore, M., Klug, A. & Choo, Y. (2001) Proc, Natl Acad. 
ScL USA 98: 1437-1441; and WO 01/53480. Therefore, in one embodiment, 2-fmger 
units are linked to make poly-zinc finger nucleotide-binding domains. A pool of 4096 
such 2-finger units, that recognise all possible 6 bp sequences (4^=4096), represents an 
archive sufficient to rapidly create universal nucleic acid recognition, by simple linkage, 
in an "off-the-shelf manner. See Moore et al, supra and WO 01/53480. 

Poly-zinc finger peptides according to this invention may be constructed 
containing 2, 3, 4, 5, 6 or more zinc finger modules. Such poly-zmc finger peptides may 
contain inter-finger linkers other than the canonical (TGEKP) linker sequence, as 
described, for example, in WO 01/53479; Moore, M., Choo, Y. & Klug, A, (2001) Proc. 
Natl Acad. ScL USA 98: 1432-1436; and Moore, M., Klug, A. & Choo, Y. (2001) Proc. 
Natl Acad. ScL USA 98: 1437-1441. Briefly, linker sequences maybe flexible or 
structured but, in general, will not form base-specific interactions with the target 
nucleotide sequence. A ^flexible' linker is defined as one which does not form a specific 
secondary stmcture in solution, whereas a 'stmctured' linker is defined as one that adopts 
a particular secondary stmcture in solution. Preferably, flexible linkers include the 
sequences GGERP (SEQ ID NO:18), GSERP (SEQ ID NO:19), GGGGSERP (SEQ ID 
NO:20), GGGGSGGSERP (SEQ ID NO:21), GGGGSGGSGGSERP (SEQ ID NO:22), 



wo 02/099084 



PCT/US02/22272 



23 

GGGGSGGSGGSGGSGGSERP (SEQ ID NO:23). Preferably, the structured linker 
comprises an amino acid sequence that is not capable of specifically binding nucleic acid. 
More preferably, the structured linker comprises the amino acid sequence of TFIIIA 
finger IV. Alternatively, or in addition, the structured linker is derived firom a zinc finger 
5 by mutation of one or more of its base contacting residues to reduce or aboUsh nucleic 
acid binding activity of the zinc finger. The zinc finger may be finger 2 of wild type 
Zif26S mutated at positions -1, 2, 3 and/or 6, 

In one embodiment, this invention provides for the construction and screening of poly- 
10 zinc finger peptides containing at least one natural zinc finger module. 

In another embodiment, this invention provides for the construction and screening of 
poly-zinc fmger peptides containing at least one natural zinc finger module, linked with 
the canonical linker sequence -TGEKP- (SEQ ID NO:3). 
15 ^ 
In one embodiment, methods for the construction and use of poly-zinc finger peptide 
comprising natural zinc finger modules are provided. 

In another embodiment, methods for the construction and use of poly-zinc finger peptide 
20 comprising natural zinc finger modules, linked with the canonical linker sequence 
-TGEKP- (SEQ ID NO:3), are provided. 

In a fiirther embodiment, methods for the construction and use of poly-zinc finger 
peptides comprising at least one natural zinc finger module, containing either flexible or 
25 stmctured linkers (as described above and in WO 01/53480), are provided. 

b. Advantages of Natural Zinc Finger Modules 

Zinc finger modules are compact and stable stmctures of approximately 30 amino acids, 
30 which contain ttie fiiU information required to bind a nucleic acid triplet or overlapping 
quadruplet. As such, they have proven to be extremely versatile scaffolds for engineering 
novel DNA-binding domains. See, for example, Rebar, E. J. & Pabo, C. O. (1994) 
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Science 263, 671-673; Jamieson, A. C, Kim, S.-H. & Wells, J. A. (1994) Biochemistry 
33, 5689-5695; Choo, Y. & Klug, A. (1994) Proc. Natl. Acad. Sci. U.S.A. 91, 1 1 163- 
1 1 167; Choo, Y., Sanchez-Garcia, I. & Klug, A. (1994) Nature 372, 642-645; Wu, H., 
Yang, W.-P. & Barbas HI, C. F. (1995) Proc. Natl. Acad. Sci. USA 92, 344-348; 
5 Greisman, H. A. & Pabo, C. O. (1997) Science 275, 657-661 ; Isalan, M., Klug, A. & 
Choo, Y. (1998) Biochemistry 37, 12026-12033; Choo, Y. (1998) Nature Struct. Biol. 5, 
264-265; Segal, D. J., Dreier, B., Beerli, R. R. & Barbas, C. F. (1999) Proc. Natl. Acad. 
Sci. USA 96, 2758-2763; Isalan, M. & Choo, Y. (2000) JMol Biol 295, 471-477; and 
Beerli, R. R., Dreier, B., Barbas, C.F. (2000) Proc Natl Acad Sci U S A 97, 1495-500. 
1 0 The resulting engineered zinc finger domains have increased our knowledge of sequence- 
specific DNA recognition, as well as provided a wide range of potential tools for 
medicine and. biotechnology. 

As a result of these and other studies on zinc finger engineering, it has been recognised 
15 that an individual zinc finger module does not necessarily recognise a simple nucleotide 
triplet, as was first thought; but instead, can bind to an overlapping quadruplet of double 
stranded DNA. See, for example, Isalan et al. (1997) Proc Natl Acad Sci U S A 94, 5617- 
5621; and WO98/53057). In this respect, zinc finger engineering strategies have been 
particularly important for deciphering the mechanism and specificity of these interactions. 

20 

With the recent completion of the human genome project and the rapidly advancing fields 
of transgenic animals and plants, thousands of uncharacterised (and characterised) genes 
have (and will) become valid targets for fimctional genomics and other such projects. 
Concomitantly, engineered zinc finger peptides (often as a component of "designer" 

25 transcription factors) are emerging as one of the most universal and desirable ways of 
regulating the expression of specific genes within cells. See, for example, Choo, Y., 
Sanchez-Garcia, I. & Klug, A. (1994) Nature 372: 642-645; Beerii, R. R., Dreier, B. & 
Barbas, C. F. IH (2000) Proc. Natl. Acad. Sci. USA 97: 1495-1500; Kim, J-S. & Pabo, C. 
O. (1998) Proc. Natl. Acad. Sci. USA 95: 2812-2817; Kang, J. S. & Kim, J-S. (2000) J. 

30 Biol. Chem. 275: 8742-8748; Zhang etal. (2000)7. Biol. Chem. 275:33,850-33,860; Liu 
etal. (2001) y. Biol. Cheni. 276:11,323-11,334; Kesxetal. (2002) Gettes. Devel.l6:27-32; 
and WO 00/41566. 
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Notwithstanding the remarkable progress in zinc finger engineering, there remain several 
issues that limit the use of engineered zinc fingers for such applications. Points of 
particular concem include the potential immunogenicity of non-natural zinc fingers, and 
5 the 'fine-tuning' of particular aspects of the protein-DNA interactions to obtain optimal 
and specific zinc finger-nucleic acid contacts. 

The present invention overcomes problems such as immxmogenicity and optimal binding 
specificity, by exploiting the vast repertoire of naturally occurring zinc fibngers to 
10 construct targeted zinc finger proteins having novel specificities. 

Immunogenicity 

The main function of the immune system is to detect, and render harmless, foreign 
1 5 particles which have invaded the body as a whole, or individual cells or organs. Toreign' 
in this context means non-host, i.e. a substance which has originated firom a different 
species, or one which has originated as a result of a mutation al event (such as might 
generate a malignant cell). On encotmtering such an antigenic particle, either in solution 
or on the surface of an infected cell, the body's defences rapidly destroy/remove it by - 
20 complex pathways which involve the interaction of many members of the immune 

system. For a good overview of immimology see Roitt, Essential Immunology^ Blackwell 
Science Ltd. and Roitt, L, Brostoff, J. & Male, D. Immunology y 4^^ Ed. Mosby. Hence, all 
biological therapeutic agents, such as peptides, nucleic acids, vimses, etc., risk eliciting 
an immune response in the recipient. Particularly for cases in which repeated doses of a 
25 therapeutic agent are required, this response can be strong and potentially dangerous to 
the host organism. 

The immune system functions through either innate or adaptive responses. The innate 
response is usually the body's first internal line of defence. Phagocytic cells recognise 
30 and bind to foreign objects in extracellular environments. Once boimd, the foreign object 
is internalised and destroyed. Foreign therapeutic agents such as peptides and nucleic 
acids, which are administered directly to the blood stream of the recipient, risk being 
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detected and possibly destroyed before they even reach their intended target. This 
response is one of primitive non-specific recognition of non-host agents, and does not 
adapt with time or exposure to the antigen. 

Foreign therapeutic agents (or infectious agents such as bacteria and viruses), which 
evade the iimate immune response and may have been successfully delivered to a 
particular cell have not necessarily avoided the host's immune system. Proteins that are 
expressed in cells are routinely degraded within lysosomes, and short peptide firagments, 
generally of between 6 and 9 amino acids, are transported to the cell surface and 
presented to the host's immune system. This is the start of the host's second internal 
defence mechanism against invasion, the adaptive immune response. The proteins 
responsible for displaying such peptide fi-agments are known as major-liistocompatibility 
complexes (MHC) proteins. Lymphocyte cells, known as T-lymphocytes, dock with the 
MHC proteins and scan the peptide firagments displayed. Contact of a T-lymphocj^e with 
a fi-agment specifically recognised as not belonging to the host organism initiates an 
immunological cascade wliich ultunately results in the host cell being destroyed or 
undergoing apoptosis. This mechanism is one of specific recognition, and once 
recognised as foreign, the antigen is 'remembered' so that any fixture invasions by the 
agent are dealt with more and more rapidly, B-cells are another type of lymphocyte that 
recognise extracellular particles and then produce and release antibodies to help combat 
the agent. 

To avoid potentially damaging the host orgaoism and to ensure the successflil delivery 
and action of a therapeutic peptide it is important to make it as much like a host protein as 
is reasonably possible. In the case of synthesised therapeutic antibodies for human use, a 
great deal of work has gone in to the 'himianisation' of antibodies produced by other 
animal species (See EP 0239400). In this invention we present a solution for the 
equivalent problem associated with zinc finger therapeutic peptides. 

To some extent, prior art zinc finger engineering strategies have attempted to minimise 
the risk of eliciting immune responses by using an engineering scaffold that is compatible 
with ii.e, that originates firom) the recipient, and by limiting the sizes of the varied regions 
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within the final product. For example, typical engineered zinc fingers utilize a scaffold 
such as the three-finger DNA-binding domain of Zif268 (containing approximately 100 
amino acid residues). Because the amino acid sequence of Zif268 is completely 
conserved in a variety of species, including mice and humans, the scaffold is not itself 
5 immunogenic in these species. However, in order to engineer new DNA-binding 
domains, stretches of approximately 7 amino acids must be varied within each zinc 
finger. These sequences of 7 amino acids represent modifications in positions -1, 1, 2, 3, 
4, 5, and 6 of the a-helix of each finger. Although these engineered regions are 
considered to be relatively small, they are approximately the lengtli of the peptide 
10 fragments displayed on the surface of cells by MHC molecules. Hence, they may provide 
antigenic peptide fragments in several registers of the amino acid sequence, which may 
result in dangerous and/or midesirable immune responses in the host. 

Accordmgly, it is not known whether this type of engineering strategy will be entirely 
15 sufficient to avoid all potential imdesirable effects, or indeed whether it will create the 
most optimal firamework for all zinc finger-nucleic acid interactions. 

In addition to the zinc fingers themselves, it is also possible that inter-finger linker 
sequences could present potential immunological problems. Fortunately, natural zinc 

20 finger proteins display strong conservation and homology in their linker sequence. A 

very large number of natural fingers are joined by the canonical linker peptide -TGEKP- 
(SEQ ID NO:3), located between the final zinc chelating residue (usually histidine) of the 
first finger, and the first residue of the second finger (usually a large hydrophobic residue 
such as tyrosme or phenylalanine, which begins the p-sheet). Hence, tiie use of the 

25 canonical linker sequence -TGEKP- (SEQ ID NO:3), to join natural zinc finger modules 
in a non-natural order, will reduce the possibility of eUciting an imdesirable immune 
reaction to a minimum. Furthermore, there are so many natural zinc fingers which are 
already joined by canonical linker sequences, that if deemed necessary, the database of 
natural zinc fingers used for the construction of poly-zinc finger peptides may be 

30 restricted to those already flanked by such Unkers. 
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The periodicity of zinc fingers and their amenabiUty to Unkage using the TGEKP (SEQ 
ID NO:3) motif is illustrated in Table 2. 

a -HELIX LINKER 
-1123456 

YA CPVESCDRRFS (SEQ ID NO: 24) RSDELTRHIRIH (SEQ ID NO: 25) TGEKP 
PQ CRI CMRNFS (SEQ ID NO: 26) RSDHLSTHIRTH (SEQ ID NO: 27) TGEKP 
FA CDI CGRKFA (SEQ ID NO: 28) RSDERKRHTKIH (SEQ ID NO: 29) TGEKP 

Table 2. A functional three-finger DNA-bindmg domain based on the peptide sequence 
of Zif268. TGEKP linker motifs are underlined. The helical residues of each zinc finger 
are numbered relative to the first helical position, position +L Conserved Cysteines and 
Histidines forming the classical Cys2His2 zinc finger core are shoAvn in bold. 

Fine-'Tuning of Zinc Finger-Nucleic Acid Interactions. 

It has previously been shown that zinc fingers cannot simply be regarded as independent 
nucleic acid-binding modules. Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistrv 37, 
12026-12033; Isalan, M., Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617- 
5621 . The interactions between adjacent zinc fingers can be complex and involve overlap 
of binding sites, which means that optimal interfaces are not easily engineered through 
rational design. Combinatorial Hbrary selection systems, which if designed correctly 
necessarily result in interface compatibiUty, can help to engineer better optimisation of 
the zinc finger-nucleic acid interface. See, for example, WO98/53057. However, all 
library selection systems suffer from the problem of library size, whereby because of 
physical constraints, it is impossible to include an exhaustive combination of 
randomisations to cover all potentially important sequence-space. For example, to 
optimise the zinc finger-nucleic acid interface, subtle amino acid variations may be 
needed, even fi-om positions outside the recognition a-helix. Furthermore, alternative 
approaches to zinc finger engineering, such as 'affinity maturation' through random 
mutation or gene shuffling, which may (to a limited extent) increase the coverage of 
sequence space, may also raise the probability of generating imdesirable immunological 
problems. Hence, it is possible that the creation of truly optimal zinc finger domains for 
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recognition of specific nucleic acid sequences may be outside the scope of traditional 
engineering strategies. 

In contrast, naturally occiuring zinc finger modules have already been 'fine-tuned' by 
5 thousands of years of natural selection and are, under normal circumstances, non- 

immimogenic in their host organism. The human genome project has revealed that zinc 
finger-containing proteins constitute the second most abundant family of proteins in 
humans, with well over 600 members. Since zinc finger proteins usually contain several 
individual zinc finger modules, the human genome provides a repertoire of thousands of 

10 natural zinc finger modules for the creation of composite binding polypeptides. 

Furthermore, because there are only 64 (=4^) possible 3 bp sequences and 256 (=4'^) 
possible 4 bp sequences, it is likely that a natural zinc finger domain exists which is 
capable of binding to every potential 3- or 4-nucleotide target sequence. Consequently, 
natural zinc fingers are a very usefiil resource for the production of composite binding 

15 polypeptides comprising zinc fingers. At present, the natural binding site of many natural 
zinc finger modules is not known. Thus, to be useful for the construction of composite 
binding polypeptides, nucleotide sequence preferences for certain natural zinc fingers are 
determmed according to rules tables disclosed in the following section ('^Binding 
Specificity of Natural Zinc Finger Modules"). 

20 

To create optimal poly-zinc finger peptides the potentially significant problem of 
interface incompatibility must be addressed, since natural zinc finger modules will not 
necessarily be compatible with each other when juxtaposed. In this respect, a library 
construction and screening system is preferably employed which hnks natural zinc finger 

25 modules in non-natural combinations, and screens them against possible target sequences 
of greater than 3 or 4 bp in length (which represents the possible bhiding site of a single 
zinc finger module), to determine optimal 2- or 3 -finger domains. In this way, the 
cooperative nature of zinc finger binding is taken into account in the design and selection 
of composite binding polypeptides, and in the detemaination of tlae sequence specificity of 

30 their binding. In one embodiment, a library of poly-zinc finger peptides containing at 

least one natural zinc finger module is provided. Preferably, poly-zinc finger peptides of 
the library contain at least two natural zinc finger modules. 
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c. Binding Specificity of Natural Zinc Finger Modules 

Disclosed herein are certain improvements to current limitations on the use of customised 
zinc finger nucleic acid binding domains, through the use of natural zinc finger modules. 
By using either natural 1-fmger or 2-fmger sub-domains, and/or novel combinatorially- 
mixed, pre-selected 2-finger sub-domains, it is possible to constmct poly-zinc finger 
peptides that bind any desired nucleotide target sequence, using non-natural combinations 
of natural zinc fingers. 

This approach is particularly suited for human gene therapy apphcations, but the 
invention is not just limited to zinc finger modules encoded by the human genome. For 
applications within transgenic animals such as mice, chicken, etc., the same system can 
be used, but incorporating natural zinc finger modules fi-om those species instead (see 
Example 3), The genome of any organism (e.g., animal, plant, bacterium, vims, etc.) can 
thus provide a genetic 'toolbox' of non-immunogenic, structurally optimised zinc fingers 
for applications in that organism. 

Before such zinc finger modules can be utilised, however, it is essential that their optimal 
binding site is determined, in isolation, or preferably as part of a 2- or 3-finger 
subdomain. Natural zinc finger modules are advantageously fiised into subdomains 
comprising two or three zinc finger modules in random arrangement, optionally 
comprising an anchor finger, then subjected to binding site analysis. An 'anchor' zinc 
finger is one for which the binding specificity is known, such as, for example, finger 1 or 
finger 3 of Zif268, each of which binds the sequence 5'-GCG-3'. An anchor finger is 
attached to the N- or C-terminus of the zinc finger module(s) or subdomain for which the 
binding specificity is to be determined, and acts as an anchor to set the binding register 
for ttie binding site selection. For example, if the binding site preference of a pair of 
natural zinc fiaigers is to be determined, finger 1 of Zif268 may be fiised to the N- 
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terminus of the pair of natural fingers, and a 5'-GCG-3' anchor sequence is placed at the 
3' end of 6 or more randomised nucleotides. Selection of the optimal binding site may 
thus be conducted with an oligonucleotide containing the sequence 5'-XXX-XXX-GCG- 
3' (SEQ ID NO:30), where X is any specified nucleotide. The anchor sequence thereby 
5 allows the binding site preference of the zinc finger libraries to be easily determined. 
Such procedures are described in the Examples. 

Screening for Zinc Finger Binding Specificity 

10 There are various approaches, known to those in the art, for screening nucleic acid 

binding peptides for their binding specificity. To determine the binding specificity of, for 
example, zinc finger peptides, procedures can be conducted using: (a) a library of zinc 
fingers and a specified target sequence — to select one or more zinc finger peptides with a 
particular binding preference; or (b) a single zinc finger peptide and a random population 

15 of target sequences — to select one or more optimal binding sites for a particular peptide. 
For many applications, such as for the creation of transcription factors for regulating 
specific gene activity, it is often preferable to screen zinc finger libraries against specific 
target sequences. In this way, the search is geared towards a particular application. 
However, if the fimction or binding specificity of a natural protein is the object of the 

20 investigation, a library of potential binding sites can be screened useing a single peptide. 
Some such methods are outlined below. 

A typical method for screening libraries of nucleic acid binding polypeptides against 
specific target sites is that of phage display. Phage display protocols generally involve 

25 expressing tlie peptides under study as fiisions with the gEH major coat protein of 

bacteriophage (J. McCafferty, R. H. Jackson, D. J. Chiswell, (1991) Protein Engineering 
4, 955-961). Suitable protocols for the selection of zinc finger peptides have been 
described and are well known to those in the art. See^ for example, Choo, Y. & Klug, A. 
(1994) Proc. Nati. Acad. Sci. U.S.A. 91, 1 1 163-1 1 167; Choo, Y., Sanchez-Garcia, L & 

30 Klug, A. (1994) Nature 372, 642-645; Choo, Y. (1998) Nature Struct. Biol. 5, 264-265; 
Choo, Y. & Klug, A. (1997) Curr. Onin. Str. Biol. 7, 1 17-125; 7 Isalan, M., Klug, A. & 
Choo, Y. (1998) Biochemistrv 37, 12026-12033; Isalan, M. & Choo, Y. (2000) JMol 
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Biol 295, 471-477; Isalan, M., Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617- 
5621; WO 01/53480, WO 01/53479, WO 96/06166, WO 98/53057, WO 98/53058, WO 
98/53059 and WO 98/53060 and references cited therein; see also Examples, infi-a. In 
general, sequences comprising target sites are bound, such as through biotin-streptavidin, 
to a solid support, such as a magnetic particle, or the surface of a tube or well. A solution 
of phage expressmg members of a library of zinc fmger peptides is then added to the 
immobilised target site. Non-bound phage are washed away and bound phage (containing 
the DNA encoding the bound zinc finger peptide), are collected. The collected phage 
sample is usually reused in furflier rounds of selection to enrich for the tightest binding 
zinc finger peptide. 

Phage display protocols based on random mutagenesis of zinc finger modules are known 
to have a number of limitations. First, as discussed above, the library size that can be 
expressed on the surface of phage is limited by the efficiency of procedures such as 
cloning and transformation. Furthermore, the efficiency of incorporation of glll-zinc 
finger fusions into phage and hence, zinc finger peptide expression, is determined by the 
number of zinc finger modules. Therefore, 2-finger peptides are expressed more 
efficiently than 3-finger peptides and so on. For this reason, phage display protocols are 
generally limited to the assay of polypeptides comprising 3 or fewer zinc finger modules. 

An alternative to phage display is an in vitro selection system. In such a system, libraries 
of zinc fingers can be produced by PGR using degenerate primer oligonucleotides. 
Target binding sites are added to the end of the DNA encoding the zinc fmger peptide. 
Zinc finger peptide expression may be performed directly from PGR products using an in 
vitro expression kit, such as the TNT T7 Quick Goupled Transcription/Translation 
System for PGR DNA ^romega, Madison, WI, USA), or anotiier suitable expression 
system. The components of tibe expression reaction (including the zinc finger 
gene/binding site) are compartmentalised by suspension in an emulsion, in such a way 
that (on average) only one copy of the zinc finger gene / binding site is present in each 
compartment. See, for example, Tawfik, D.S. & Griffiths, A.D. (1998) Nat. BiotechnoL 
16: 652-656. Zinc finger peptides which bind the specified target site (and the gene 
encoding them) can be collected vising, for example, a suitable epitope tag (such as myc, 
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FLAG or HA tags), and the non-bound binding sites/zinc finger genes are removed. The 
genes encoding zinc finger peptides that bind the required target site can then be 
amplified by PGR and used in further rounds of selection if required. 

5 A preferred method for selecting a zinc finger peptide which binds a specified target 
sequence is described in Example 4. Briefly, the DNA encoding a library of zinc fuiger 
peptides with an attached epitope tag is diluted into as many ahquots as it is possible to 
screen (e.g. 384 or 1534 aliquots). This creates pools of sub-libraries with reduced 
numbers of variants. The DNA is then amplified by PGR and used to produce protein, 

10 fi*om a suitable in vitro expression system, as described above. A specified binding site 
with an attached biotin molecule, and a horse radish peroxidase (HRP)-conjugated 
antibody to the peptide-attached epitope tag may then be added. Binding site / bomid 
zinc finger / antibody complexes may be collected by binding to streptavidin and the 
samples are washed to remove imbound zinc finger and antibodies. The samples 

15 containing the highest amoimt of boimd zinc finger peptide can be detected by adding an 
HRP substrate solution. The original DNA stock from such positive samples may then be 
diluted into aliquots (as above), PCR-ampUfied and used for the next round of selection. 
In this way, pools of zinc finger encoding genes with the desired activity are isolated, 
subdivided into pools of reduced variation and re-isolated until the most active clone is 

20 identified. 

Principal advantages of the in vitro systems described above are: (a) there is virtually no 
limit to the library size which can be screened (up to 10^^ different PGR products can 
easily be made); and (b) pol3^eptides comprising larger numbers of linlced zinc finger 
25 modules {e.g., 4, 5, 6, 7, or more) can be assayed. Another in vitro selection system 
which can be used is polysome/ribosome display. See, for example, Mattheakis, L.C,, 
Bhatt, R.R. & Dower, WJ. (1994) Proa Natl Acad. ScL USA, 91: 9022-9026; and WO 
00/27878. 

30 Protocols for the reverse selection procedure, Le. the selection of a particular binding site 
fiom a mixed population using a single nucleic acid binding polypeptide, include SELEX 
(systematic evolution of ligands by exponential enrichment) and microarray techniques. 
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The SELEX procedure has been well described. See, for example, Drolet, D. W,, Jenison, 
R.D., Smith, D.E,, Pratt, D. & Hicke, B J. (1999) Comb. Chem. ffigh Throughput Screen 
2: 271-278; Burden, D.A. & Osheroff, N. (1999) J. BioL Chem. 274: 5227-5235; 
Shultzaberger, R.K. & Schneider, T.D. (1999) Nucleic Acids Res. 27: 882-887; Marozzi, 
A., Meneveri, R., Giacca, M., Gutierrez, MJ., Siccardi, A.G. & Ginelli, E. (1998) J. 
Biotechnol. 15: 117-128; and US Patents No. 5,270,163; 5,475,096; 5,595,877; 
5,670,637; 5,696,249; 5,817,785 and 6,331,398. A single nucleic acid binding 
polypeptide is expressed, either in vitro or in vivo, and screened against a library of target 
sequences. Nucleic acid binding polypeptides are collected (along with any bound target 
sites) using an epitope tag (as above) or another suitable procedure. Bound target sites 
are amplified by PCR and may be used in further rounds of selection, to enrich for the 
optimal binding site, or sequenced. 

Microarray technology provides a method of screening a particular polypqjtide or nucleic 
acid against thousands to millions of target sequences on a single slid support such as, for 
example, a glass or nitrocellulose sUde. For example, the members of a library encoding 
polypeptides comprising 2 linked zinc fingers will bind a 6 bp recognition sequence. 
Hence, there are 4096 (==4^) unique binding sites for such a library. All 4096 of these 
sites can be arrayed onto a single glass slide, for example, allowing a specified 2-finger 
peptide to be screened simultaneously against every possible binding site. The amount of 
binding to each target sequence can be visuaUsed and quantified using simple 
fluorescence measurements. For example, the zinc finger peptide may be expressed in 
viti^o, or on the surface of phage. Isolated zinc finger peptides may contain an epitope tag 
for labelling purposes, whereas bound phage can be detected using a primary antibody 
against a phage coat protein, such as gVIII. A secondary antibody conjugated to, for 
example, R-phycoeryflirin, horseradish peroxidase or alkaline phosphatase, can be used to 
provide a visible, quantifiable signal when a suitable substrate is apphed. See, for 
example, Bulyk et aL (2001) Proc, Natl Acad, Set f/iS4;98,:13, 7158-7163, which is 
incorporated, by reference, in its entirety. 
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Prediction of Binding Specificity 

The screening approaches described above rely on the assay of large libraries of 
randomly-selected natural zinc finger modules, to obtain one or more zinc finger modules 
that optimally bind a particular target nucleic acid sequence. In order to simplify the 
process further and ensure a more rapid selection of optimal zinc finger modules for a 
particular target site, sub-libraries can be created. In this disclosure, the terai 'sub- 
library' refers to a library of natural zinc finger modules that have been roughly 
categorised according to tlaeir predicted binding specificity. For example, the total 
population of natural zinc fingers can be sub-divided to create Ubraries comprising zinc 
fmger modules whose predicted binding sites ai-e guanine (G) rich, cytosine (C) rich, 
adenine (A) rich or thymine (T) rich. Alternatively, sub-libraries can be categorised as 
binding G in the 3' position, in the central position, or in the 5' position of a nucleotide 
triplet, etc. Alternatively, sub-libraries can be created which comprise zinc finger 
modules predicted to bind a particular triplet sequence such as, for example, GGG, GGA, 
GGC, GGT, GAG, GCG, GTG, etc. This approach combines knowledge of the modes of 
zinc finger-nucleic acid recognition, gained from studies on artificial zinc finger variants, 
with tlie benefits of combinatorial library selection. It also takes into accoimt the fact that 
concerted interactions between adjacent zinc fingers, i.e. overlapping contacts, can affect 
the binding affinity and/or specificity of individual zinc fingers. See, for example, 
Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistrv 37, 12026-12033; Isalan, M., 
Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617-5621. Thus, for example, a 
composite binding polypeptide comprising two fmgers, each having a predicted binding 
specificity for a particular triplet, can be easily screened to determine if that pair of 
fingers are compatible with each other for binding to the 6-nucleotide target site 
comprising their individual target sequences. This strategy is described fiirther in the 
Examples. 

For the process of creating sub-libraries of natm al zinc fingei-s according to predicted 
binding preference, the rules set forth in international patent applications WO 96/06166, 
WO 98/53057, WO 98/53058, WO 98/53059 and WO 98/53060, and described in more 
detail below, are used. These rules allow the assignment of an amino acid residue, in an 
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appropriate position of the recognition region of a zinc finger module (generally 
comprising amino acids -1 tlirough +6, with respect to the start of the alpha-heUcal 
portion of the finger), which will bind a specified nucleotide in a triplet or quadmplet 
target subsite. However, these rules can also be used to predict the sequence of a target 
subsite that would be preferentially bound by a zinc finger of given amino acid sequence. 
In particular, the identity of the amino acid residing at a particular position in the 
recognition region of a natural zinc finger module can be used to predict the identity of a 
nucleotide at a particular location in a target subsite. These 'mles' should be considered 
as a guide to target site preference and not a guaranteed prediction, as binding site 
specificity may be determined by variations elsewhere in the zinc finger module (i.e. 
outside of the recognition region), may be influenced by context, or may be influenced by 
factors as yet unknown. It should also be noted that some mles may be more generally 
applicable than others. 

In the application of these rales, it should be noted that the recognition region of a zinc 
finger aligns such that the N-terminal to C-tenninal sequence of the finger is arranged 
along the nucleic acid strand to which it binds in a 3'-to-5' direction. As a result, when a 
zinc finger sequence and a nucleic acid sequence (to which the finger binds) are aUgned, 
the primary interactions occur between the zinc finger and the 'minus' strand of the 
nucleic acid sequence (i.e. the strand which has a 3'-to-5' orientation). Furthermore, as 
stated above, the recognition region of a zinc finger comprises amino acids -1 through 
+6, with respect to the start of the alpha-hehcal portion of the finger. With respect to a 
particular zinc finger, an amino acid residue designated ++2 refers to the residue present 
in the adjacent (in the C-terminal direction) zinc finger, which (in certain instances) 
buttresses an amino acid-nucleotide interaction and/or participates in a cross-strand 
interaction with a nucleotide. 

Thus, the following set of rales can be used to predict a 3 bp target subsite for a given 
natural zinc finger module: (a) if the 5' base in the triplet is G, then position +6 in the a- 
helix is Arg; or position +6 is Ser or Thr and position -H-2 is Asp; (b) if the 5' base in the 
triplet is A, then position +6 in the a-helix is Gin and ++2 is not Asp; (c) if the 5' base in 
the triplet is T, then position +6 in the a-helix is Ser or Thr and position -H-2 is Asp; (d) if 
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the 5' base in the triplet is C, then position +6 in the a-heUx may be any amino acid, 
provided that position ++2 in the a-hehx is not Asp; (e) if the central base in the triplet is 
G, then position +3 in the a-helix is His; (f) if the central base m the triplet is A, then 
position +3 in the a-helix is Asn; (g) if the central base in the triplet is T, then position +3 
5 in the a-helix is Ala, Ser or Val; provided that if it is Ala, then one of the residues at -1 or 
+6 is a small residue; (h) if the central base in the triplet is C, then position H-3 in the a- 
helix is Ser, Asp, Glu, Leu, Thr or Val; (i) if the 3' base in tiie triplet is G, then position - 
1 in the a-helix is Arg; (j) if the 3' base in the triplet is A, then position -1 in the a-helix 
is Gin; (k) if the 3' base in the triplet is T, then position -1 in the a-helix is Asn or Gin; (1) 
10 if the 3' base in the triplet is C, then position -1 in the a-hehx is Asp. 

Furthermore, a natural zinc finger module may be capable of binding specifically to a 
four-nucleotide target subsite that overlaps with the target subsite of an adjacent zinc 
finger. In this case a different set of 'rules' can be used to detemiine predicted binding 
sites for each zinc finger module. Accordingly, in the description below, the overlapping 
4 bp bindmg site is described such that position 4 is the 5' base of a typical triplet binding 
site, position 3 is the central position of a typical triplet, position 2 is the 3 ' position of a 
typical triplet, and position 1 is the complement of the nucleotide which is contacted by 
the cross strand interaction firom the +2 position of the zinc finger module. Position 1 can 
also be considered to be the 5' base of the triplet or quadruplet contacted by an adjacent 
(in the N-terminal direction) finger, if present. 

Binding to each base of a quadruplet by an a-hehcal zinc finger nucleic acid binding 
motif in a natural protein can be predicted with reference to the following rules: (a) if 
base 4 in the quadruplet is G, then position +6 in the a-heUx is Arg or Lys; (b) if base 4 in 
the quadruplet is A, then position +6 in the a-helix is Glu, Asn or Val; (c) if base 4 in the 
25 quadruplet is T, then position +6 in the a-helix is Ser, Thr, Val or Lys; (d) if base 4 in the 
quadruplet is C, then position +6 in the a-helix is Ser, Thr, Val, Ala, Glu or Asn; (e) if 
base 3 in the quadruplet is G, then position +3 in the a-helix is His; (f) if base 3 in the 
quadruplet is A, then position +3 in the a-heUx is Asn; (g) if base 3 in the quadruplet is T, 
then position 4-3 in the a-helix is Ala, Ser or Val; provided that if it is Ala, then one of the 



15 



20 
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residues at -1 or +6 is a small residue; (h) if base 3 in the quadruplet is C, then position 
4-3 in the a-helix is Ser, Asp, Glu, Leu, Thr or Val; (i) if base 2 in the quadruplet is G, 
then position -1 in the a-helix is Arg; (j) if base 2 in the quadruplet is A, then position -1 
in the a-helix is Gin; (k) if base 2 in the quadruplet is T, then position -1 in the a-helix is 
5 His or Thr; (1) if base 2 in the quadruplet is C, then position -1 ia the a~helix is Asp or 
His; (m) if base 1 in the quadruplet is G, then position +2 is Glu; (n) if base 1 in the 
quadruplet is A, then position +2 Arg or Ghi; (o) if base 1 in the quadruplet is C, then 
position +2 is Asn, Gin, Arg, His or Lys; (p) if base 1 in the quadruplet is T, then position 
-h2isSerorThr. 

10 The above rules may be further refined to those described below: (a) if base 4 in the 

quadruplet is G, then position +6 in the a-hehx is Arg; or position +6 is Ser or Thr and 
position -*-+2 is Asp; (b) if base 4 in the quadruplet is A, then position +6 in the a-helix is 
Gin and ++2 is not Asp; (c) if base 4 in the quadruplet is T, then position +6 in the a- 
helix is Ser or Thr and position ++2 is Asp; (d) if base 4 in the quadruplet is C, then 

15 position +6 in the a-helix may be any amino acid, provided that position ++2 in tlie a- 

helix is not Asp; (e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 
(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; (g) if base 3 in 
the quadraplet is T, then position +3 in the a-helix is Ala, Ser or Val; provided that if it is 
Ala, then one of the residues at -1 or +6 is a small residue; (h) if base 3 in the quadruplet 

20 is C, then position +3 in the a-helix is Ser, Asp, Glu, Leu, Thr or Val; (i) if base 2 in the 
quadruplet is G, then position -1 in the a-helix is Arg; (j) if base 2 in the quadruplet is A, 
then position -1 in the a-heUx is Gin; (k) if base 2 in the quadruplet is T, then position -1 
in the a-helix is Asn or Gin; (1) if base 2 in the quadruplet is C, then position -1 m the a- 
helix is Asp; (m) if base 1 in the quadruplet is G, then position +2 is Asp; (n) if base 1 in 

25 the quadruplet is A, then position +2 is not Asp; (o) if base 1 in the quadruplet is C, then 
position +2 is not Asp; (p) if base 1 in the quadmplet is T, then position +2 is Ser or Thr. 

The rules therefore predict that the presence of an Asp (D) residue at position +2 will 
preclude binding to either A or C by an amino acid at position +6 in an adjacent N- 
30 terminal finger. Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistrv 37, 12026-12033; 
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Isalan, M., Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617-56212. Therefore, 
natural zinc fingers containing Asp, Glu, Asn or Gin at +6 are likely to be incompatible 
with any C-terminal finger containing an Asp residue at position +2. Although there are 
many such rules to describe the overlap between adjacent zinc fingers, a certain degree of 
5 degeneracy exists in these rules. Nonetheless, physical selection procedures {e.g., library 
construction and screening) can be used to extract optimal pairs of fingers for any given 
target subsite interface. 

Not all natural zinc fingers have a DNA-binding fimction. For example, it is knovm that 
10 many zinc fingers, such as those from TFIIIA, bind to RNA (Clemens, K. R. et al, (1993) 
Science 260: 530-533; Bogenhagen, D.F. (1993) Mol Cell Biol 13: 5149-5158; Searles, 
M. A. et al, J, Mol Biol 301 : 47-60 (2000)). The rules governing RNA buading by zinc 
fingers are less well understood than those of DNA binding, but some RNA binding zinc 
fingers can be identified on the basis of a characteristic sequence motif. Clemens, K. R. 
15 etal, (1993) Science 260: 530-533; Bogenhagen, D.F. (1993) Mol Cell Biol 13: 5149- 
5158; Searles, M. A. et al (2000) J. Mol Biol 301: 47-60. Furthermore, some zinc 
fingers, such as those from the protein Dcaros, are able to form protein-protein 
interactions. Such zinc fingers often contain large hydrophobic patches. Mackay, J. P. & 
Crossley,M, (1998) Trends Biochem. ScL 23: 1-4. 

20 

To this end, applied bioinformatic processing can help to detemiine which candidates in a 
particular genome are best suited to fixlfiUing a particular fimction, such as DNA-binding. 
In the case of zinc fingers, numerous documented databases exist denoting amino acid 
residues that are most likely to be found at particular positions within a DNA-binding 

25 zinc finger. See^ for example, Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistrv 37, 
12026-12033; Choo, Y. & Klug, A. (1997) Curr. Qpin. Str. Biol. 7, 117-125; WO 
98/53060; WO 98/53059; WO 98/53058. As an example, disclosed herein is a database 
of approximately 200 natural human zinc fingers which have been selected (on the basis 
of coded contacts) as having potentially useful DNA-binding activity (see Example 1). 

30 Also disclosed in Example 1 are the predicted DNA target sequences of these zinc 
fingers, assigned according to the rules set out above. 
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As the hiaman genome contains almost 700 zinc finger-containing proteins, there are 
many other candidates that can be included in a more inclusive library of natural zinc 
fingers. A selection of these are disclosed in Example 2. 

5 Similar work can be carried out in other organisms, such as farm (cows, pigs, sheep, 
chickens, etc.), laboratory (monkeys, rats, mice, etc.) and domestic (dogs, cats, etc.) 
animals. In this case, it is necessary to select natural zinc finger modules fi:om the 
respective genomes of such organisms. Examples of zinc finger modules which have 
been selected fi-om mouse, chicken and certain plant genomes, are disclosed in 

10 Examples. 

d. Zinc Finger Chimeric Peptides 

In a preferred embodiment, the composite binding polypeptides described herein 
comprise chimeric nucleic acid binding polypeptides. 

15 A chimeric nucleic acid binding polypeptide, also referred to as a fusion 

polypeptide, comprises a binding domain (comprising a number of nucleic acid binding 
polypeptide modules or fingers) designed to bind specifically to a target nucleotide 
sequence, together with one or more fiirtiher biological effector domains or fimctional 
domains. The terms "biological effector domain" and "functional domain" refer to any 

20 polypeptide (of functional firagmrat thereof) that has a biological fimction. Included are 
enzymes, receptors, regulatory domains, transcriptional activation or repression domains, 
binding sequences, dimerisation, trimerisation or multimerisation sequences, sequences 
involved in protein transport, localisation sequences such as subcellular localisation 
sequences, nuclear localisation, protein targeting or signal sequences. Furthermore, 

25 biological effector domains may comprise polypeptides involved in chromatin 

remodelling, chromatin condensation or decondensation, DNA replication, transcription, 
translation, protein synthesis, etc. Fragments of such polypeptides comprising the 
relevant activity (z.e., ftinctional fi-agments) are also included in this definition. Preferred 
biological effector domains include transcriptional modulation domains such as 
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transcriptional activators and transcriptional repressors, as well as their functional 
fragments. 

The effector domain(s) can be covalently or non-covalently attached to the 
binding domain. 

Chimeric nucleic acid binding polypeptides preferably comprise transcription 
factor activity, for example, a transcriptional modulation activity such as transcriptional 
activation or transcriptional repression activity. For example, a zinc finger clumeric 
polypeptide may comprise a binding domain designed to bind specifically to a particular 
nucleotide sequence, and one or more fiirther biological effector domains, preferably a 
transcriptional activation or repression domain, as described in further detail below. The 
zinc finger chimeric polypeptide may comprise one or more zinc fingers or zinc finger 
binding modules. 

Preferably, in the case of a chimeric polj^eptide comprising transcriptional 
modulation activity, a nuclear locaUsation domain is attached to the DNA binding domain 
to direct the chimeric polypeptide to the nucleus. 

Generally, a chimeric nucleic acid binding polypeptide, such as a chimeric zinc 
finger polypeptide, can also include an effector domain to regulate gene expression. The 
effector domain can be directly derived firom a basal or regulated transcription factor such 
as, for example, transactivators, repressors, and proteins that bind to insulator or silencer 
sequences. See^ for example, Choo & Klug (1995) Curr, Opin. Biotech, 6: 431-436; 
Choo, Y. & Klug, A. (1997) Curr. Opin. Str. Biol. 7, 1 17-125; Rebar & Pabo (1994) 
Science 263: 671-673; Jamieson et al (1994) Biochem. 33: 5689-5695; Goodrich et al 
(1996) Cell 84: 825-830; Vostrov, A. A. & Quitschke, W. W. (1997) J, Biol Chem. 272: 
33353-33359 and WO 00/41566 and references disclosed therein. Other useful domains 
are derived from receptors such as, for example, nuclear hormone receptors (Kumar, R & 
Thompson, E. B. (1999) Steroids 64: 310-319 ), and their co-activators and co-repressors 
(Ugai, H. et aL (1999) J. Mol Med. 77: 481-494). 
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A chimeric nucleic acid binding polypeptide can also include otlier domains that 
may be advantageous within the context of the control of gene expression. Such domains 
include, but are not limited to, protein-modifying domains such as histone 
acetyltransferases, kinases, methylases and phosphatases, which can silence or activate 
genes by modifying DNA structure or the proteins that associate with nucleic acids. See, 
for example, Wolffe. Science 272: 371-372 (1996); Taunton et al. Science 272: 408-41 1 
(1996); Hassig et al., Proc. Natl. Acad. Sci. USA 95: 3519-3524 (1998); Wang, Trends 
Biochem. Sci. 19: 373-376 (1994); and Schonthal & Semin, Cancer Biol. 6: 239-248 
(1995). Additional useful effector domains include those that modify or rearrange nucleic 
acid molecules such as methyltransferases, endonucleases, ligases, recombinases etc. 
See, for example, Wood, Ann. Rev. Biochem. 65: 135-167 (1996); Sadowski, FASEB J. 
7: 760-767 (1993); Cheng, Curr. Opin. Struct. Biol 5: 4-10 (1995); Wu et al. (1995) 
Proc. Natl. Acad. Sci. USA 92:344-348; Nahon & Raveh, Nucleic Acids Res 1998 Mar 
1;26(5): 1233-9; Smith et al. Nucleic Acids Res. 1999 Jan 15;27(2):674-81; and Smith et 
al. (2000) Nucleic Acids Res. Sept 1 ; 28(17):3361-9. It will be appreciated that the 
biological effector domain portion of the chimeric polypeptide may itself also comprise 
such activities, without the need for fiirlher additional domains. 

For the purpose of gene activation, zinc finger domains may be fiised to the VP64 
domain. 5ee, for example, Seipel a/., £MBO J. 11: 4961-4968 (1996). Otiier preferred 
transactivator domains include the herpes simplex virus (HSV) VP16 domain (Hagmann 
et al. (1997) Virol. 71: 5952-5962; Sadowski et al. (1988) Nature 335:563-564), 
ti-ansactivation domain 1 and/or domain 2 of the p65 subunit of nuclear factor-icB (NF- 
kB (Schmitz, M. L. etal. (1995) J! Biol Chem. 270: 15576-15584 ). Other transcription 
factors are reviewed in, for example, Lekstrom-Himes J. & Xanthopoulos K. G. (C/EBP 
family) J. Biol. Chem. 273: 28545-28548 (1998); Bieker, J. J. et al, (globin gene 
transcription factors) ^nn. N. Y. Acad. Sci. 850: 64-69 (1998), and Parker, M. G. 
(estrogen receptors) Biochem. Soc. Symp. 63: 45-50 (1998). 

Use of a transactivation domain from the estrogen receptor is disclosed in 
Metivier, R., Petit, FG., Valotaire, Y. & Pakdel, F. (2000) Mol. Endocrinol. 14: 1849- 
1871. FurthOTnore, activation domains from the globin transcription factors EKLF 
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(Pandya, K. Donze, D. & Townes T. (2001) J. Biol Chem. 276: 8239-8243) may also be 
used, as well as a transactivation domain from FKLF (Asano, H. Li, XS.& 
Stamatoyaimopoulos, G. (1999) Mo/. Cell Biol 19: 3571-3579). C/EPB transactivation 
domains may also be employed in the methods described herein. The C/EBP epsilon 
5 activation domain is disclosed in Verbeek, W., Gombart, AF, Chumakov, AM, Muller, C, 
Friedman, AD, & Koeffler, HP (1999) Blood 15: 3327-3337. Kowenz-Leutz, E. & Leutz, 
A. (1999) Mol Cell 4: mS-lA'i disclose the use of the C/EBP tau activation domain, 
while the C/EBP alpha transactivation domain is disclosed in Tao, H., & Umek, RM. 
(1999) DNA Cell Biol 18: 75-84. 



10 It is loiown that zinc finger proteins may be fused to transcriptional repression 

domains such as the Kruppel-associated box (KRAB) domain to form powerful 
repressors. These domains are known to repress expression of a reporter gene even when 
bound to sites a few kilobase pairs upstream from the promoter of the gene (Margolin et 
al, 1994, Proc. Natl Acad, Set USA 91: 4509-4513). Hence, in certain embodiments, 

15 the KRAB repressor domain from the human KOX-1 protein is used to repress gene 
activity (Moosmaim et al, Biol Chem, 378: 669-677 (1997); Thiesen et al. New 
Biologist 2: 363-374 (1990)). In additional embodiments, larger fragments of the KOX-1 
protein comprising the KRAB domain, up to and including full-length KOX protein, are 
used as transcriptional repression domains. See, for example, Abrink et al (2001) Proc. 

20 Natl Acad Scl t/iSyl 98:1422-1426. Other prefen-ed transcriptional repressor domains 

are known in the art and include, for example, the engj-ailed domain (Han et al, EMBO J, 
12: 2723-2733 (1993)), the snag domain (Grimes et al, Mol Cell Biol 16: 6263-6272 
(1996)) and the transcriptional repression domain of v-erbA (e.g,, Umov et al (2000) 
^M50 J. 19:4074-4090; Sap a/. (1989) iVa/M7-e 340:242-244 andCianae^a/. (1999) 

25 EMBO J. 17:7382-7394). 

Biological effector domains can be covalently or non-covalently Unked to a 
binding domain. In one embodiment, a covalent linker comprises a flexible amino acid 
sequence; fusion polypeptides according to this embodiment comprise a nucleic acid 
binding domain fused, by an amino acid Unker, to a biological effector domain. 
30 Alternatively, a covalent linker may comprise a synthetic, non-amino acid based. 
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chemical linker, for example, polyethylene glycol. Synthetic linkers are commercially 
available, and methods of chemical conjugation are known in the art. Covalent linkers 
may comprise flexible or stractured linkers, as described above. 

Non-covalent linkages between a nucleic acid binding domain and an effector 
5 domain can be formed using, for example, leucine zipper/coiled coil domains, or other 
naturally occurring or synthetic dimerisation domains. See e,g., Luscher, B. & Larsson, 
L. G. Oncogene 18:2955-2966 (1999) and Gouldson, P. R. et al, 
Neuropsychopharmacology 23: S60-S77 (2000). 

The expression of composite binding polypeptides (for example, zinc fmger 

10 polypeptides) can be controlled by tissue specific promoter sequences such as, for 

example, the /c/cpromoter (thymocytes, Gu, H. et aL, Science 265: 103-106 (1994)); the 
human CD2 promoter (T-cells and thymocytes, Zhumabekov, T. et aL, J. Immunological 
Methods 185: 133-140 (1995)); the alpha A-crystallin promoter (eye lens, Lakso, M. et 
aL^ Proc. Natl Acad. Set. 89: 6232-6236 (1992)); the alpha-calcium-calmodulin- 

15 dependent kinase II promoter (hippocampus and neocortex, Tsien, J. et aL, Cell 87: 1327- 
1338 (1996)); the whey acidic protein promoter (mammary gland, Wagner, K.-U. et aL, 
Nucleic Acids Res. 25: 4323-4330 (1997)); the aP2 enhancer/promoter (adipose tissue. 
Barlow C. et a/.. Nucleic Acids Res. 25: 2543-2545 (1997)); the aquaporin-2 promoter 
(renal collecting duct, Nelson R. et al. Am. J. Physiol. 215: C216-C226 (1998)); and the 

20 mouse myogenin promoter (skeletal muscle, Grieshaimner, U. et al. Dev. Biol. 197: 234- 
247 (1998)). The expression of such polypeptides can also be controlled by inducible 
systems, in particular, controlled by small molecule induction such as the tetracycline- 
controUed systems (tet-on and tet-off), the RU-486 or tamoxifen hormone analogue 
systems, or the radiation-inducible early growth response gene-1 (EGRl) promoter. 

25 These promote constructs and inducible systems have the benefit of being able to 

provide organ-specific and^or inducible expression of target genes for use in applications 
such as gene therapy and transgenic animals. 
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e. Vectors 



The nucleic acid encoding the nucleic acid binding polypeptide such as a zinc 
finger polypeptide can be incorporated into intermediate vectors and transfonned into 
prokaryotic or eukaryotic cells for expression or DNA amplification. 



5 As used herein, vector (or plasmid) preferably refers to discrete elements that are 

used to introduce heterologous nucleic acid into cells for either expression or replication 
thereof The tenn "heterologous to the cell" means that the sequence does not naturally 
exist in the genome of the host ceU but has been introduced into the cell. The term 
"introduced into" means that a procedure is performed on a cell, tissue, organ or organism 

10 such that the gene encoding the nucleic acid binding polypeptide (for example, a zinc 
finger polypeptide) previously absent from the cell or cells is then present in the cell or 
cells. Alternatively, or in addition, the gene may be initially present in the cell or cells 
and subsequently altered by introduction of heterologous DNA. A heterologous sequence 
may include a modified sequence introduced at any chromosomal site, or which is not 

15 integrated into a chromosome, or which is introduced by homologous recombination such 
that it is present in the genome in the same position as the native allele. Selection and use 
of such vectors are well within the skill of the person of ordinary skill in the art. Many 
vectors are available, and selection of an appropriate vector will depend on the intended 
use of the vector, i.e. whether it is to be used for DNA amplification or for nucleic acid 

20 expression, the size of the DNA to be inserted into the vector, and the host cell to be 

transformed with the vector, etc. Another consideration is whether the vector is to remain 
episomal or integrate into the host genome. Suitable vectors may be of bacterial, viral, 
insect or mammalian origin. Intemiediate vectors for storage or manipulation of the 
nucleic acid encoding the nucleic acid binding polypeptide, or for expression and 

25 purification of the polypeptide are typically of prokaryotic origin. Most expression 
vectors are shuttle vectors, i.e. they are capable of rephcation in at least one class of 
organisms but can be transfected into another class of organisms for expression. For 
example, a vector is cloned in E, coli and then the same vector is transfected into yeast or 
mammalian cells even though it is not capable of replicating independently of the host 

30 cell chromosome. DNA may also be replicated by insertion into the host genome. The 
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nucleic acid binding polypeptides such as zinc finger polypeptides described here are 
preferably inserted into a vector suitable for expression in mammalian cells. 

Prokaryote, yeast and higher eukaryote cells may be used for replicating DNA and 
producing the nucleic acid binding protein. Suitable prokaryotes include eubacteria, such 
5 as Gram-negative or Gram-positive organisms, such as E, coli, e-g- ^- coli K-12 strains, 
DH5a and HB 101, or BacilU. Further hosts suitable for the vectors include eukaryotic 
microbes such as filamentous fimgi or yeast, e.g. Saccharomyces cerevisiae. Higher 
eulcaryotic cells include insect and vertebrate cells, particularly mammalian cells 
including human cells or nucleated cells firom other multicellular organisms. In recent 
10 years propagation of vertebrate cells in culture (tissue culture) has become a routine 

procedure. Examples of usefiil^mammahanhost cell lines are epitheUal or fibroblastic cell 
Unes such as Chinese hamster ovary (CHO) cells, NIH 3T3 cells, HeLa cells or 293T 
cells. The host cells referred to in this disclosure comprise cells in in vitro culture as well 
as cells that are within a host animal. 

15 Each vector contains various components depending on its fimction (amplification 

of DNA or expression of DNA) and the host cell for which it is compatible. The vector 
components generally include, but are not limited to, one or more of the following: an 
origin of replication, one or more selectable marker genes, a promoter, an enhancer 
element, a transcription tennination sequence and a signal sequence. 

20 Both expression and cloning vectors generally contain nucleic acid sequence that 

enable the vector to replicate in one or more selected host cells. Typically in cloning 
vectors, this sequence is one that enables the vector to replicate independently of the host 
chromosomal DNA, and includes origins of replication or autonomously replicating 
sequences. Such sequences are well known for a variety of bacteria, yeast and viruses. 

25 The origin of replication firom the plasmid pBR322 is suitable for most Gram-negative 
bacteria, the 2\x plasmid origin is suitable for yeast, and various viral origins (e.g. SV 40, 
polyoma, adenovirus) are usefiil for cloning vectors in mammalian cells. Generally, the 
origin of replication component is not needed for mammalian expression vectors xmless 
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these are used in mammalian cells competent for high level DNA replication, such as 
COS cells. 

Advantageously, an expression and cloning vector contains a selection gene also 
referred to as selectable marker. This gene encodes a protein necessary for the survival or 
5 growth of transformed host cells grown in a selective culture medium. Host cells not 
transformed with the vector containing the selection gene will not survive in the culture 
medium. Typical selection genes encode proteins that confer resistance to antibiotics and 
other toxins, e.g. ampicillin, neomycin, methotrexate or tetracycline, complement 
auxotrophic deficiencies, or supply critical nutrients not available fi-om complex media. 

10 

Since the replication of vectors is conveniently done in E, coli, an E. coli genetic 
marker and an E, coli origin of replication are advantageously included. These can be 
obtained from E. coli plasmids, such as pBR322, Bluescript© vector or a pUC plasmid, 
e.g. pUC18 or pUC19, which contain both E. coli replication origin and E. coli genetic 
15 marker conferring resistance to antibiotics, such as ampicillin and tetracycline. Vectors 
such as these are commercially available. 

As to a selective gene marker appropriate for yeast, any marker gene can be used 
which facilitates the selection for transformants due to the phenotypic expression of die 
20 marker gene. Suitable markers for yeast are, for example, those conferring resistance to 
antibiotics G418, hygromycin or bleomycin, or provide for prototrophy in an auxotrophic 
yeast mutant, for example tlie URA3, LEU2, LYS2, TRPl, or HISS gene. 

Suitable selectable markers for mammalian cells are those that enable the 
25 identification of cells competent to take up nucleic acid, such as dihydrofolate reductase 
(DHFR, methotrexate resistance), thymidine kinase, or genes conferring resistance to 
neomycin, G418 or hygromycin. The mammalian cell transformants are placed under 
selection pressure which only those transformants which have taken up and are 
expressing the marker are uniquely adapted to survive. In the case of a DHFR or 
30 glutamine synthase (GS) marker, selection pressure can be imposed by culturing the 

transformants under conditions in which the pressure is progressively increased, thereby 
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leading to amplification (at its chromosomal integration site) of both the selection gene 
and the linked DNA that encodes the nucleic acid binding protein. Amplification is the 
process by which genes in greater demand (such as one encoding a protein that is critical 
for growth), together with closely associated genes (such as one encoding a composite 
5 binding polypeptide), are reiterated in tandem within the chromosomes of recombinant 
cells. Increased quantities of desired protein are usually synthesised from this amplified 
DNA. 

Expression and cloning vectors usually contain control sequences that are 
recognised by the host organism and are operably linked to the nucleic acid encoding a 

10 nucleic acid binding polypeptide. The term "control sequences" is intended to include, at 
a minimvim, components whose presence can influence expression, and can also include 
additional components whose presence is advantageous, for example, leader sequences 
and fixsion partner sequences. The terai "operably linked" means that the components 
described are in a relationship permitting them to function in their intended manner. 

15 Typical control sequences include promoters, enhancers and other expression regulation 
signals such as terminators. Such a promoter may be inducible or constitutive. A 
regulatory sequence operably linked to a coding sequence is ligated in such a way that . 
expression of the coding sequence is achieved imder conditions compatible with the 
control sequences. 

20 The term promoter is well known in the art and encompasses nucleic acid regions 

ranging in size and complexity from minimal promoters to promoters including upstream 
elements and enhancers. Suitable promoters for use in prokaryotic and eukaryotic cells 
are well loiovm in the art, and described in for example. Current Protocols in Molecular 
Biology (Ausubel et ah, eds., 1994) and Molecular Cloning. A Laboratory Manual 

25 (Sambrook et aL, 2""^ ed. 1989). 

Promoters suitable for use with prokaryotic hosts include, for example, the p- 
lactamase and lactose promoter systems, alkaline phosphatase, the tryptophan (Tip) 
promoter system and hybrid promoters such as the tac promoter. Their nucleotide 
sequences have been pubUshed, thereby enabling the skilled worker to ligate them to 
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DNA encoding a composite binding protein, using linkers or adapters to supply any 
required restriction sites. Promoters for use in bacterial systems will also generally 
contain an adjacent ribosonie binding site (e.g., a Shine-Dalgamo sequence) operably 
linked to the DNA encoding the composite binding polypeptide. 

5 Preferred expression vectors are bacterial expression vectors, which comprise a 

promoter of a bacteriophage such as phage lambda, SP6, T3 or T7, for example, which is 
enable of functioning in bacteria. In one of the most widely used expression systems, 
the nucleic acid encoding the fusion protein can be transcribed from a vector by T7 RNA 
polymor?iSG (Studier et al. Methods in EnzymoL 185: 60-89, 1990). In the£'. coli 

10 BL21 (DE3) host strain, used in conjunction with pET vectors, the T7 RNA polymerase is 
produced from the A,-lysogen DE3 in the host bacterium, and its expression is under the 
control of the IPTG inducible lac UV5 promoter. This system has been employed 
successftiUy for over-production of many proteins. Alternatively, the polymerase gene 
may be introduced on a lambda phage by infection with an int" phage such as the CE6 

15 phage, which is commercially available (Novagen, Madison, WI, USA). Other vectors 
include vectors containing the lambda Pl promoter such as PLEX (Invitrogen, NL), 
vectors containing the trc promoters such as pTrcHisXpressTm (Invitrogen), or pTrc99 
(Pharmacia Biotech, SE), or vectors containing the tac promoter such as pKK223-3 
(Pharmacia Biotech), or PMAL (New England Biolabs, Beverly, MA, USA). A suitable 

20 vector for expression of proteins in mammalian cells is the CMV enhancer-based vector 
such as pEVRF (Matthias, et al, (1989) Nucleic Acids Res. 17, 6418). 

Suitable promoting sequences for use with yeast hosts may be regulated or 
constitutive and are preferably derived from a highly expressed yeast gene, especially a 
Saccharomyces cerevisiae gene. Thus, the promoter of the TRPl gene, the ADHI or 

25 ADHn gene, the acid phosphatase (PH05) gene, a promoter of the yeast mating 
pheromone genes coding for the a- or a-factor or a promoter derived from a gene 
encoding a glycolytic enzyme such as the promoter of the enolase, glyceraldehyde-3- 
phosphate dehydrogenase (GAP), 3-phosphoglycerate kinase (PGK), hexokinase, 
pymvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3- 

30 phosphoglycerate mutase, pyruvate kinase, triose phosphate isomerase, phosphoglucose 
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isomerase or glucokinase genes, or a promoter from the TATA binding protein (TBP) 
gene can be used. Furtliermore, it is possible to use hybrid promoters comprising 
upstream activation sequences (UAS) of one yeast gene and downstream promoter 
elements including a functional TATA box of another yeast gene, for example a hybrid 

5 promoter including the UAS(s) of the yeast PH05 gene and downstream promoter 
elements including a functional TATA box of the yeast GAP gene (PH05-GAP hybrid 
promoter). A suitable constitutive PH05 promoter is, for example, a shortened acid 
phosphatase PH05 promoter devoid of the upstream regulatory elements (UAS) such as 
the PH05 (-173) promoter element starting at nucleotide -173 and ending at nucleotide -9 

10 ofthePHOS gene. 

The promoter is typically selected from promoters which are found in animal 
cells, although prokaryotic promoters and promoters functional in oHier eukaiyotic cells 
can be used. Typically, the promoter is derived from viral or animal gene sequences, may 
be constitutive or inducible, and may be strong or weak. 

1 5 Viral promoters can be derived from viruses such as polyoma virus, adenoviruses, 

adeno-associated viruses, poxviruses (e.g., fowlpox virus), papilloma viruses (e.g., BPV), 
avian sarcoma virus, cytomegalovirus (CMV), herpesviruses, retroviruses, lentiviruses 
and simian virus 40 (S V40). An example of a relatively weak viral promoter is thymidine 
kinase promoter from herpes simplex virus (HSV-TK). 

20 Mammalian derived promoters can be heterologous to the animal in which 

composite binding polypeptide (such as zinc finger polypeptide) expression is to occur, or 
they can be host sequences. In some appUcations it is preferable to use a promoter tliat is 
active in all cell types, however it is often preferable to use promoter sequences that are 
active in specific cell types only. 

25 The actin promoter and the strong ribosomal protein promoter are examples of 

promoter sequences that are active in all cell types. In contrast, by using promoters that 
are specific for certain cell or tissue types, the gene encoding the nucleic acid binding 
polypeptide can be expressed only in the required cell or tissue types. This may be of 
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extreme importance for applications such as gene therapy, and for the production of 
viable transgenic animals. Such promoters are known in the art and include the Ick 
promoter (thymoc3^es, Gu, H. et al. Science 265: 103-106 (1994)), the human CD2 
promoter (T-cells and thymocytes, Zhumabekov, T. et aL, J. Immunological Methods 
5 185: 133-140 (1995)); the alpha A-crystallin promoter (eye lens, Lakso, M. et aL, Proc, 
Natl. Acad. Sci, 89: 6232-6236 (1992)), the alpha-calcium-cahnodulin-dependent kinase 
n promoter (hippocampus and neocortex, Tsien, J. et aL, Cell 87: 1327-1338 (1996)), the 
whey acidic protein promoter (mammary gland, Wagner, K.-U. et aL, Nucleic Acids Res, 
25: 4323-4330 (1997)), the aP2 enhancer/promoter (adipose tissue. Barlow C. et aL, 
10 Nucleic Acids Res, 25: 2543-2545 (1997)), the aquaporin-2 promoter (renal collecting 
duct. Nelson R. et aL, Am. J, Physiol. 275: C216-C226 (1998)), the mouse myogenin 
promoter (skeletal muscle, Grieshammer, U. et aL, Dev. Biol. 197: 234-247 (1998)), 
retinoblastoma gene promoter (nervous system, Jiang, Z. et aL, J. BioL Chem. 276: 593- 
600 (2001)). 

15 The expression of nucleic acid binding polypeptides such as zinc finger 

polypeptides can also be controlled by small molecule induction or other inducible 
systems such as the tetracycline inducible systems (tet-on and tet-off), the RU-486 or 
tamoxifen hormone analogue systems, or the radiation-inducible early growfli response' 
gene-1 (EGRl) promoter, all of which are commercially available. By using such 

20 inducible promoter systems, transgenic lines can be established which carry a zinc finger 
chimeric polypeptide but express it only after addition of an inducer molecule. Thus the 
genes encoding the zinc finger polypeptides or other nucleic acid binding polypeptides 
can be expressed (or not expressed) in response to the small molecule, which can be 
easily administered. These systems may also allow the time and amoimt of polypeptide 

25 expression to be regulated. 

Expression vectors typically contain expression cassettes that carry all the 
additional elements required for efficient expression of the nucleic acid in the host cell. 
Additional elements are enhancer sequences, polyadenylation and transcriptional 
termination signals, ribosome binding sites, and translational termination sequences. 
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Transcription of DNA by higher eukaryotes may be increased by inserting an 
enhancer sequence into the vector. Enhancers are relatively orientation and position 
independent. Many enhancer sequences are known from mammalian genes (e.g. elastase 
and globin). However, typically one will employ an enhancer from a eukaryotic cell 
5 virus. Examples include the SV40 enhancer on the late side of the replication origin 
(^prox. bp 100-270) and the CMV early promoter enhancer. The enhancer may be 
spliced into the vector at a position 5' or 3' to the gene encoding the zinc finger 
polypeptide or nucleic acid binding polypeptide, but is preferably located at a site 5' from 
the promoter. 

10 It has also been shown that the expression of a heterologous gene in an animal cell 

may be enhanced by retaining intron sequences (as opposed to using a cDNA clone). For 
example, intron 1 of the human CD2 gene has been shown to enhance the level of 
expression of CD2 hi human cells (Festenstein, R. et aL 1996 Science 211: \ 123). 

Advantageously, a eukaryotic expression vector encoding a nucleic acid binding 
15 protein may comprise a locus control region (LCR). LCRs are capable of directing high- 
level integration site-independent expression of transgenes integrated into host cell 
chromatin. This is particularly important where the gene encoding the zinc finger 
polypeptide or the nucleic acid binding polypeptide is to be expressed over extended 
periods of time, for applications such as transgenic animals and gene ttierapy, as gene 
20 silencing of integrated heterologous DNA - especially of viral origin - is known to occur 
(Pahner, T. D. et aL, Proc, Natl Acad, Set USA 88: 1330-1334 (1991); Harpers, K. et aL, 
Nature 293: 540-542 (1981); Jahner, D. et aL, Nature 298: 623-628 (1992); and Chen, W. 
Y. et aL, Proc. Natl Acad, Set USA 94: 5798-5803 (1997)). Typical LCRs are 
exemplified by the human P-globin cluster, and the HS-40 regulatory region from the a- 
25 globin locus. 

Eukaryotic vectors may also contain sequences necessary for the termination of 
transcription and for stabilising the mRNA transcript. Such sequences are commonly 
available from the 5' and 3' untranslated regions of eukaryotic or viral DNAs, and are 
known in the art. These regions contain nucleotide segments transcribed as 
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polyadenylated fragments in the untranslated portion of the inRNA encoding the relevant 
polypeptide. An appropriate terminator of transcription is fused downstream of the gene 
encoding the selected nucleic acid binding polypeptide such as a zinc finger protein. Any 
of a number of known transcriptional terminator, RNA polymerase pause sites and 
5 polyadenylation enhancing sequences can be used at the 3' end of the nucleic acid 

encoding for example a zinc finger polypeptide (see, for example, Richardson, J. P. Crit 
Rev. Biochem, MoL BioL 28: 1-30 (1993); Yonaha M. & Proudfoot, N. J. EMBO J. 19: 
3110-3111 (2000); Ashfield, R. etal.EMBOJ, 10: 4197-4207 (1991); Hirose, Y. & 
Manley, J. L. Nature 395: 93-96 (1998)). 

10 The nucleic acid binding polypeptides are generally targeted to the cell nucleus so 

that they are able to interact with host cell DNA and bind to the appropriate DNA target 
in the nucleus and regulate transcription. To effect this, a nuclear localisation sequence 
(NLS) is incorporated in frame witli the expressible nucleic acid binding polypeptide 
(e.g., zinc finger polypeptide) gene construct. The NLS can be fused either 5' or 3' to the 

15 sequence encoding the binding protein, but preferably it is fused to the C-terminus of the 
chimeric polypeptide. 

The NLS of the wild-type Simian Virus 40 Large T-Antigen (Kalderon et al. 
(1984) Cell 37: 801-813; and Markland et al (1987) Mol Cell Biol 7: 4255-4265) is an 
appropriate NLS and provides an effective nuclear localisation mechanism in animals. 
20 However, several altemative NLSs are known in the art and can be used instead of the 
SV40 NLS sequence. These include the NLSs of TGA-IA and TGA-IB. 

Composite binding polypeptides can comprise tag sequences to facilitate studies 
and/or preparation of such molecules. Tag sequences may include FLAG-tags, myc-tags, 
25 6his-tags, hemagglutinin tags or any other suitable tag known in the art. 

Moreover, the nucleic acid binding protein gene according to flie invention 
preferably includes a secretion sequence in order to facilitate secretion of the polypeptide 
firom bacterial hosts, such that it will be produced as a soluble native peptide rattier than 
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in an inclusion body. The peptide may be recovered from the bacterial periplasmic space, 
or the culture medium, as appropriate. 

Construction of vectors employs conventional ligation techniques. Isolated 
plasmids or DNA fragments are cleaved, tailored, and religated in the form desired to 
generate the plasmids required. If deshred, analysis to confirm correct sequences in the 
constructed plasmids is performed in a known fashion. Suitable methods for constructing 
expression vectors, preparing in vitro transcripts, introducing DNA mto host cells, and 
performing analyses for assessing nucleic acid binding protein expression and function 
are loiown to those skilled in the art. Gene presence, amplification and / or expression 
may be measured in a sample directly, for example, by conventional Southern blotting, 
Northern blotting to quantify the transcription of mRNA, dot blotting (DNA or RNA 
analysis), or in situ hybridisation, using an appropriately labelled probe which may be 
based on a sequence provided herein. Those skilled in the art will readily envisage how 
these methods may be modified, if desired. 

f. Applications of Composite Binding Polypeptides 

Nucleic acid binding proteins according to the invention can be employed in a wide 
variety of applications, including diagnostics and as research tools, and also in therapeutic 
applications and in transgenic organisms. 

In Vitro Applications 

Poly-zinc finger peptides of this invention may be employed as diagnostic tools for 
identifying the presence of nucleic acid molecules in a complex mixture. Nucleic acid 
binding molecules according to the invention can differentiate single base pair changes in 
target nucleic acid molecules. 
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Accordingly, the invention provides methods for determining the presence of a target 
nucleic acid molecule, v^herein the target nucleic acid molecule comprises a target 
sequence, comprising the steps of: 

5 a) preparing a nucleic acid binding protein, by a method set forth above, which is specific 
for the target nucleic acid sequence; 

b) exposing a test system to the nucleic acid binding protein under conditions v/hich 
promote binding of the protein to the target sequence, and removing any nucleic acid 
binding protein which remains unbound; 

10 c) testing for the presence of the nucleic acid binding protein in the test system; 

wherein, if the nucleic acid binding protein is detected, the target nucleic acid molecule is 
present and, if the nucleic acid binding protein is not detected, the target nucleic acid 
molecule is not present. In additional embodiments, quantitation of the amoxmt of nucleic 
acid binding protein allows quantitation of the amount of the target nucleic acid molecule 

1 5 present in the test system. 

In a preferred embodiment, the nucleic acid binding molecules of the invention can be 
incorporated into an ELISA assay. For example, phage displaying composite binding 
polypeptides can be used to detect the presence of the target nucleic acid, and visualised 
20 using enzyme-linked anti-phage antibodies. 

Further improvements to the use of phage expressing a composite binding polypeptide for 
diagnosis can be made, for example, by co-expressing a marker protein fused to the minor 
coat protein (gVIII) of a filamentous bacteriophage. Since detection with an anti-phage 

25 antibody would then be uimecessary, the time and cost of each diagnosis would be fiirther 
reduced. Depending on the requirements, suitable markers for display might include 
fluorescent proteins (A. B. Cubitt, et al, (1995) Trends Biochem Set 20, 448-455; T, T. 
Yang, et al^ (1996) Gene 173, 19-23), or an enzyme such as alkaline phosphatase (J. 
McCafferty, R. H. Jackson, D. J. Chiswell, (1991) Protein Engineering 4, 955-961). 

30 Labelling different types of diagnostic phage with distinct markers would allow multiplex 
screening of a single nucleic acid sample. Nevertheless, even in the absence of such 
refinements, the basic ELISA technique is reliable, fast, simple and particularly 
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inexpensive. Moreover it requires no specialised apparatus, nor does it employ hazardous 
reagents such as radioactive isotopes, making it amenable to routine use in the clinic. The 
major advantage of the protocol is that it obviates the requirement for gel electrophoresis, 
and so opens the way to automated nucleic acid diagnosis. 

The invention provides nucleic acid bmding proteins that have exquisite specificity. The 
invention lends itself, therefore, to the design of any molecule of which specific nucleic 
acid binding is required. For example, the proteins according to the invention may be 
employed in the manufacture of chimeric restriction enzymes, in which a nucleic acid 
cleaving domain is fused to a nucleic acid binding domain comprising a zinc finger as 
described herein. 

In Vivo Applications 

The invention fiulher provides composite binding polypeptides (and nucleic acids 
encoding them) that may be used in transgenic organisms (such as non-human animals), 
as therapeutic agents, and in gene therapy ^plications. 

A Ixansgenic animal is an animal, preferably a non-human animal, containing at least one 
foreign gene, called a transgene, in its genetic material. Preferably, the transgene is 
contained in the animal's gerai Une such that it can be transmitted to the animal's 
offspring. Transgenic animals may carry the transgene in all their cells or may be 
genetically mosaic. 

Constructs useful for creating transgenic animals according to the invention comprise 
genes encoding nucleic acid binding polypeptides, optionally under the control of nucleic 
acid sequences directing their expression in cells of a particular lineage. Alternatively, 
nucleic acid binding polypeptide encoding constmcts may be under the control of non- 
lineage-specific promoters, and/or inducibly regulated. Typically, DNA fragments on the 
order of 10 kilobases or less are used to construct a transgenic animal (Reeves, 1998, 
New. Anat., 253 :19). A transgenic animal expressing one transgene can be crossed to a 
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second transgenic animal expressing second transgene such that their offspring will carry 
both transgenes. 

Although the majority of previous studies have involved transgenic mice, other species of 
transgenic animal have also been produced, such as rabbits, sheep, pigs (Hammer et al., 
5 1985, Nature 315:680-683; Kumar, et al., U.S. 05922854; Seebach, et al., U.S. Patent 
No. 6,030,833) and chickens (Salter et al., 1987, Virology 157:236-240). Transgenic 
animals are currently being developed to serve as bioreactors for the production of useful 
pharmaceutical compounds (V an Brunt, 1988, Bio/Technology 6: 1 149-1 154; Wilmut, et 
al,, 1988, New Scientist (July 7 issue) pp. 56-59). Up-regulation of endogenous or 
10 exogenous genes expressing useful polypeptides, such as therapeutic polypeptides, by 
means of a heterologous nucleic acid binding polypeptide, may be used to produce such 
polypeptides in transgenic animals. Preferably, the polypeptides are secreted into an 
extractable fluid, such as blood or mammary fluid (miUc), to enable easy isolation of the 
polypeptide. 

15 

Furthermore, the invention provides the use of polypeptide fusions comprising an 
integrase, such as a viral integrase, and a nucleic acid binding protein according to the 
invention to target nucleic acid sequences in vivo (Bushman, (1994) PNAS (USA) 
91 :9233-9237). hi gene therapy applications, the method may be applied to the deUvery 
20 of functional genes into defective genes, or the delivery of a heterologous nucleic acid in 
order to disrupt an endogenous gene. Alternatively, genes may be delivered to known, 
repetitive stretches of nucleic acid, such as centromeres, together with an activating 
sequence such as an LCR. This would represent a route to the safe and predictable 
incorporation of nucleic acid into the genome. 

25 

In conventional therapeutic applications, nucleic acid binding proteins according to this 
embodiment may be used to specifically eliminate cells having mutant vital proteins. For 
example, if a mutant ras gene is targeted, cells comprising this mutant gene will be 
destroyed because ras is essential to cellular survival. Altematively, the action of 
30 transcription factors can be modulated, preferably reduced, by administering to the cell 
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agents which bind to the binding site specific for the transcription factor. For example, 
the activity of HIV tat may be reduced by binding proteins specific for HIV TAR. 

Moreover, binding proteins according to the invention can be coupled to toxic molecules, 
5 such as nucleases, which are capable of causing irreversible nucleic acid damage and cell 
death. Such agents are capable of selectively destroying cells that comprise a mutation in 
their endogenous nucleic acid. 

Nucleic acid binding proteins and derivatives thereof as set forth above may also be 
10 applied to ttie treatment of infections and the hke in the form of organism-specific 

antibiotic or antiviral drugs. In such appUcations, the binding proteins can be coupled to 
a nuclease or other nuclear toxin and targeted specifically to the nucleic acids of 
microorganisms. 

1 5 Transgenic animals comprising transgenes, optionally integrated within the genome, and 
expressing heterologous zinc finger and olher nucleic acid binding polypeptides firom 
transgenes, may be created by a variety of methods. Methods for producing transgetiic 
animals are known in the art, and are described by Gordon, J. & Ruddle, F.H. Science 
214: 1244-1246 (1981); Jaenisch, R. Proc. Natl, Acad. ScL USA 73: 1260-1264 (1976); 

20 Gossler et al, (1986) Proa Natl. Acad. Set USA 83:9065-9069; Hogan et al. 

Manipulating the Mouse Embryo: A Laboratory Manual, (1988); and US. Pat. Nos. 
5,175,384; 5,434,340 and 5,591,669. 

Pharmaceutical Preparations 

25 

The invention likewise relates to pharmaceutical preparations which contain the 
compomids according to the invention or pharmaceutically acceptable salts thereof as 
active ingredients, and to processes for their preparation. 



30 



The pharmaceutical preparations according to the invention which contain the compound 
according to the invention or pharmaceutically acceptable salts thereof are those for 
enteral, such as oral, furthermore rectal, and parenteral administration to (a) warm- 
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blooded aiiiraal(s), the pharmacological active ingredient being present on its own or 
together with a pharmaceutically acceptable carrier. The daily dose of the active 
ingredient depends on the age and the individual condition and also on the manner of 
administration. 

5 

The novel phamiaceutical preparations contain, for example, from about 10 % to about 
80% (or any integral percentage therebetween), preferably from about 20 % to about 60 
%5 of the active ingredient. Pharmaceutical preparations according to the invention for 
enteral or parenteral administration are, for example, those in unit dose forms, such as 

10 sugar-coated tablets, tablets, capsules or suppositories, and furthermore ampoules. These 
are prepared in a maimer known per se, for example by means of conventional mixing, 
granulating, sugar-coating, dissolving or lyophilising processes. Thus, pharmaceutical 
preparations for oral use can be obtained by combining the active ingredient with solid 
carriers, if desired granulating a mixture obtained, and processing the mixture or granules, 

15 if desired or necessary, after addition of suitable excipients to give tablets or sugar-coated 
tablet cores. 

Suitable carriers are, in particular, fillers, such as sugars, for example lactose, sucrose, 
maimitol or sorbitol, cellulose preparations and/or calcium phosphates, for example 

20 tricalcium phosphate or calcium hydrogen phosphate, furthemiore binders, such as starch 
paste, using, for example, com, wheat, rice or potato starch, gelatin, tragacanth, 
methylcellulose and/or polyvinylpyiTolidone, if desired, disintegrants, such as the 
abovementioned starches, furthermore carboxymethyl starch, crosslinked 
polyvinylpyrrolidone, agar, alginic acid or a salt thereof, such as sodium alginate; 

25 auxiliaries are primarily glidants, flow-regulators and lubricants, for example silicic acid, 
talc, stearic acid or salts thereof, such as magnesium or calcium stearate, and/or 
polyethylene glycol. Sugar-coated tablet cores are provided with suitable coatings which, 
if desired, are resistant to gastric juice, using, inter alia, concentrated sugar solutions 
which, if desired, contain gum arable, talc, polyvinylpyrrolidone, polyethylene glycol 

30 and/or titanium dioxide, coating solutions in suitable organic solvents or solvent mixtures 
or, for the preparation of gastric juice-resistant coatings, solutions of suitable cellulose 
preparations, such as acetylcellulose phthalate or hydroxypropylmethylcellulose 
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phthalate. Colorants or pigments, for example to identify or to indicate different doses of 
active ingredient, may be added to the tablets or sugar-coated tablet coatings. 

Other orally utilisable pharmaceutical preparations are hard gelatin capsules, and also soft 
5 closed capsules made of gelatin and a plasticiser, such as glycerol or sorbitol. The hard 
gelatin capsules may contain the active ingredient in the form of granules, for example in 
a mixture with fillers, such as lactose, binders, such as starches, and/or lubricants, such as 
talc or magnesium stearate, and, if desired, stabilisers, fii soft capsules, the active 
ingredient is preferably dissolved or suspended in suitable liquids, such as fatty oils, 
10 paraffin oil or liquid polyethylene glycols, it also being possible to add stabilisers. 

Suitable rectally utilisable pharmaceutical preparations are, for example, suppositories, 
wliich consist of a combination of the active ingredient with a suppository base. Suitable 
suppository bases are, for example, natural or synthetic triglycerides, paraffin 
15 hydrocarbons, polyethylene glycols or higher alkanols. Furthermore, gelatin rectal 

capsules which contain a combination of the active ingredient with a base substance may 
also be used. Suitable base substances are, for example, Uquid triglycerides, polyethylene 
glycols or paraffin hydrocarbons. 

20 Suitable preparations for parenteral administration are primarily aqueous solutions of an 
active ingredient in water-soluble form, for example a water-soluble salt, and fiarfhemiore 
suspensions of the active ingiedi ent, such as appropriate oily injection suspensions, using 
suitable lipophilic solvents or vehicles, such as fatty oils, for example sesame oil, or 
synthetic fatty acid esters, for example ethyl oleate or triglycerides, or aqueous injection 

25 suspensions which contain viscosity-increasing substances, for example sodium 
carboxymethylcellulose, sorbitol and/or dextran, and, if necessary, also stabihsers. 

The dose of the active ingredient depends on the warm-blooded animal species, the age 
and the individual condition and on the manner of administration. For example, an 
30 approximate daily dose of about 10 mg to about 250 mg is to be estimated in the case of 
oral administration for a patient weighing approximately 75 kg . 
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g. Transformation and Transfection 

DNA can be stably incorporated into cells or can be transiently expressed using 
methods known in the art and described below. Stably transfected cells can be prepared 
by transfecting cells with an expression vector containing a selectable marker gene, and 
5 growing the transfected cells under conditions selective for cells expressing the marker 
gene. To prepare transient transfectants, cells are transfected with a reporter gene to 
monitor transfection efficiency. 

There are many well-known methods of introducing foreign nucleic acids into 
host cells, which include electroporation, calcium phosphate co-precipitation, particle 
bombardment, microinjection, naked DNA, liposomes, Upofection, and viral infection etc 
(see, e.g. Sambrook et aL (1989) Molecular Cloning: A Laboratory Manual, Second 
Edition, Cold Spring Harbor Laboratory Press, and Mountain, A. Trends BiotechnoL 18: 
119-128 (2000) for a review). Any of the above methods can be used, as long as it is 
compatible with the host cell. Linear nucleic acid molecules have been found to be more 
efficiently incorporated into mammalian genomes than circular plasmids. Additionally, 
nucleic acid molecules may be delivered to specific target tissues or to individual cells. 
Viral based gene transfer is often favoured for introducing nucleic acids into mammalian 
cells and specific target tissues, and several viral delivery approaches are in clinical trials 
for gene therapy applications. However, non-viral methods are attractive due to their 
greater safety for the pmpose of gene transfer to humans. 

The preferred methods of particle bombai'dment use biolistics made from gold (or 
tungsten). Compared with otlier transfection procedures, particle bombardment requires a 
low amount of nucleic acid and a smaller number of cells, making the procedure 
generally more efficient (Heiser, W. C. Anal Biochenu 111: 185-196 (1994); Klein, T. M. 
25 & Fitzpatrick-McEUigott, S. Cm/t. Opin, BiotechnoL 4: 583-590 (1993)). The procedure 
is particularly suited for organisms that are difficult to transfect, and for intioducing DNA 
into organelles, such as mitochondria and chloroplasts. Although generally used for ex 
\nvo applications, the procedure is also suitable for in vivo transfection of skin tissue. 
Suitable methods are laiown in the art and described, for instance, in US Patent Nos. 
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5,489,520 and 5,550,318. See also, Potrykus (1990) Bio/TechnoL 8: 535-542; and 
Finneganer a/. {1994) Bio/TechnoL 12: 883-888. 

Microinjection is a common method of nucleic acid delivery to isolated cells 
(Palmiter, R. D. & Brinster, R. L. Annu. Rev, Genet 20: 465-499 (1986); Wall, R. J. et 
5 aL, J, CellBiochem. 49: 113-120 (1992); Chan, A. W. et aL, Proc. Natl Acad, Set USA 
95: 14028-14033 (1998)). DNA is generally injected into cells and the cells may then be 
re-introduced into animals. Procedures for such a technique are described in US Pat. Nos. 
5,175,384 and 5,434,340, and improvements to the technique are described in WO 
00/69257. 

10 Efficient for gene transfer in vivo cau be obtained following local injection of 

naked DNA. While expression of injected DNA in skin lasts for only a few days, injected 
DNA in mouse skeletal muscle has been shown to last for up to nine months (Wolff, J. A. 
et aL, Hum. Mol Genet: I: 363-369 (1992)). Naked DNA is particularly suited to gene 
therapy for preventive and therapeutic vaccines. 

15 Cationic liposomes containing cholesterol are particularly suited for delivery of 

nucleic acids to humans as they are biodegradable and stable in the bloodstream. 
Liposomes can be injected intravenously, subcutaneously or inhaled as an aerosol. 
Stribling et al (1992) Proc, Natl Acad, Set USA 89:11,277-11,281. Liposomes can be 
targeted to certain cell types by incorporating ligands, receptors or antibodies 

20 (immunolipids) into the lipid membrane (US. Pat. No. 4,957,773). On contacting target 
cells, entry of DNA from liposomes is via endocytosis and diffusion. Preparations of 
lipid formulations are commercially available and methods for their use are well 
documented (Bogdanenlco, E. V. etal^ Vopr, Med. Khim, 46: 226-245 (2000); Natsume, 
A. etal. Gene Jlier. 6: 1626-1633 (1999)), 

25 Uptake of DNA into animal cells can also be enhanced by using transfection 

agents. *Transfecting agent", as utilised herein, means a composition of matter added to 
the genetic material for enhancing the uptake of exogenous DNA segment (s) into a 
eukaryotic cell, preferably a mammalian cell, and more preferably a mammalian gemi 
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cell. The enhancement is measured relative to the uptake in the absence of the 
transfecting agent. Examples of transfecting agents include adenovirus-transferrin- 
polylysine-DNA complexes. These complexes generally augment the uptake of DNA into 
the cell and reduce its breakdown during its passage through the cytoplasm to the nucleus 
5 of the cell. These complexes can be targeted to the male germ cells using specijfic Ugands 
which are recognised by receptors on the cell surface of the germ cell, such as the c-kit 
ligand or modifications thereof. Other preferred transfecting agents include lipofectin™, 
hpofectamine™, DIMRIE C, Superfect, and Effectin (Qiagen), xmifectin, maxifectin, 
DOTMA, DOGS (Transfectam; dioctadecylamidoglycylspermine), DOPE (1,2-dioleoyl- 

10 sn-glycero-3 phosphoethanolamine), DOTAP (l,2-dioleoyl-3-triniethylammonium 
propane), DDAB (dimethyl dioctadecylammonixmi bromide), DHDEAB (N, N-di-n- 
hexadecyl-N, N-dihydroxyethyl ammonium bromide), HDEAB (N-n-hexadecylN, N 
dihydroxyefliylammonium bromide), polybrene, or poly (ethylenimine) (PEI). For 
example, Banerjee, R. et aL, Novel series of non-glycerol-based cationic transfection 

15 lipids for use in liposomal gene delivery,.,/. Med. Chem. 42 (21): 4292-99 (1999); 
Godbey, W. T. et al.^ Improved packing of poly (ethylenimine)-DNA complexes 
increases transfection efficiency. Gene Ther. 6 (8): 1380-88 (1999); Kichler, A et aL, 
Influence of the DNA complexation medium on the transfection efficiency of 
lipospermine/DNA particles. Gene Ther. 5 (6): 855-60 (1998); Birchaa, J. C. et aL, 

20 Physico-chemical characterisation and transfection efficiency of lipid-based gene delivery 
complexes, Int J. Phann. 183 (2): 195-207 (1999). These non-viral agents have the 
advantage that they facilitate stable integration of xenogeneic DNA sequences into the 
vertebrate genome, without size restrictions commonly associated with virus-derived 
transfecting agents. 

25 The most critical issues for applications such as gene therapy are the efficient 

delivery and appropriate expression of transgenes in host cells. For this purpose, viral 
systems are particularly well suited as vimses have evolved to efficiently cross the plasma 
membrane of eukaryotic cells and express their nucleic acids in host cells. SuitabiUty of 
viral vectors is assessed primarily on their ability to carry foreign nucleic acids and 

30 deliver and express transgenes with high efficiency. Current applications utilise both 
RNA and DNA vims based systems, and 70% of gene therapy trials use viral vectors 
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derived from retroviruses, adenovirus, adeno-associated virus, herpesvirus and pox virus. 
See, for example, Flotte et al (1995) Gene Titer, 2:357-362; Glorioso et al (1995) Aim, 
Rev. Microbiol 49:675-710; Smith (1995) Ann. Rev. Microbiol. 49:807-838; Prince 
(1998) Pathology 30:335-347; and Robbins et al (1998) Pharmacol Ther. 80:35-47. 
Retroviruses represent the most prominent gene delivery system as they mediate high 
gene transfer and expression of therapeutic genes. Members of the DNA virus family 
such as adenovirus, adeno-associated vims or herpesvirus are popular due to their 
efficiency of gene delivery. Adenoviral vectors are particularly suited when transient 
transfection of nucleic acid is preferred. Retroviruses express particular envelope 
proteins that bind to specific cell surface receptors on host cells, in order for the virus to 
enter the cell. Hence, the type of viral vector used should be determined by the tissue type 
to be targeted. See e.g., Domburg (1995) Gene Ther. 2:301-310; Gunzburg, et al (1996) 
J. Mbl Med. 74:171-182; Vil^etal (1996) Mo/. Biotechnol 5:139-158; Miller (1997) 
'T)evelopment and Applications of Retroviral Vectors" Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, New York; Karavanas et al (1998) Crit. Rev. Oncol 
Hematol 28:7-30; Hu et al (2000) Pharmacol Rev. 52: 493-51 1; and Walther et al 
(2000) Dmgs 60: 249-271 for reviews. 

Safety is a critical issue for viral.based gene delivery because most viruses are 
either pathogens or have pathogenic potential. Generally, when a replication-competent 
virus infects an animal cell it can express vkal genes and release many new infectious 
viral particles in the host organism. Hence, it is very important that during transgene 
delivery the host animal does not receive a pathogenic virus with full replication 
potential. For tliis reason, viral-host cell systems have been developed for gene therapy 
treatments to prevent the creation of repUcation-competent viruses. In this method, viral 
components are divided between a vector and a helper construct to limit the ability of the 
virus to replicate (Miller 1997). The viral vector contains the gene(s) of interest and cis- 
acting elements tiiat allow gene expression and repHcation, but contain deletions of some 
or all of the viral proteins. Helper cells (or occasionally, helper virus) are engineered to 
express the viral proteins needed to propagate the viral vectors. These new viral particles 
are able to infect target cells, reverse transcribe the vector RNA and integrate its DNA 
copy into the genome of the host, which can then be expressed. However, the vector can 
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not express the viral proteins required to create new infectious particles. Helper cell lines 
are known in the art (see Hu, W-S & Pathak, V. K. Pharmacol. Rev, 52: 493-51 1 (2000), 
for a review). 

In general, retroviral vectors are able to package reasonably long stretches of 
5 foreign DNA (up to 10 kb). Oncoviruses are a type of retrovirus, which only infect 
rapidly dividing cells. For this reason they are especially attractive for cancer therapy. 
Murine leukaemia virus (MLV)-based vectors are the most commonly used of this class. 
Spleen necrosis virus (SNV), Rous sarcoma virus and avian leukosis virus are other types. 
Lentiviral vectors are retroviral vectors that can be propagated to produce high viral titres 

10 and are able to infect non-dividing cells. They are more complex than oncoviruses and 
require regulation of their rephcation cycle. Lentiviral vectors which may be used 
include human immunodeficiency virus (HIV-1 and -2) and simian immunodeficiency 
virus (SrV) based systems. HIV infects cells of the immune system, most importantly 
CD4^ T-lymphocytes, and so may be usefiil for targeted gene therapy of this cell type. 

15 Another type of retrovirus is the spumavirus. Spumaviruses are attractive because of 
their apparent lack of toxicity. Linial (1999) J. Fzw/. 73:1747-1755. 

Adenoviral vectors have high transduction efficiency and are able to transfect a 
nimiber of different cell types, including non-dividing cells. They have a high capacity for 
foreign DNA and can carry up to 30 kb of non-viral DNA (for a review see, Kochanek, S. 

20 Hum, Gene Tlier, 10: 2451-2459 (1999)). Recombinant adenoviral (rAd) vectors are 

becoming one of the most powerful gene deUvery systems available and have been used 
to deliver DNA to post-mitotic neurons of the central nervous system (CNS) (Geddes, B. 
J. etaL, Front NeiiroendocrinoL 20: 296-316 (1999), and are used to treat diseases such 
as colon cancer (Alvarez et al. Hum, Gene Titer, 5: 597-613 (1997). Adeno-associated 

25 viras (AAV) vectors and recombinant AAV (rAAV) vectors are proving themselves to be 
safe and efficacious for the long-terai expression of proteins to correct genetic disease. 
Snyder, R. O. J. (Gene. Med. 1: 166-175 (1999)) provides a review of gene delivery 
approaches using such vectors. Construction of such vectors is described in, for example, 
Samulski et al, J. ViroL 63: 3822-3828 (1989), and US. Pat. No. 5,173,414. 
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Many gene therapy trials have been conducted and are underway (over 3,500 
people have been ti eated with gene therapy systems), and several reviews can be studied 
for details of the protocols and results (Hwu & Rosenberg, Ann N Y Acad Sci. 1994 May 
31;716:188-97; Blaese, Hosp Pract (Off Ed). 1995 Nov 15;30(ll):33-40; Blaese, Hosp 
Pract (Off Ed). 1995 Dec 15;30(12):37-45; Breau & dayman, Curr Opin Oncol. 1996 
May; 8(3):227-31; Dunbar Annu Rev Med. 1996;47:11-206; Lotze Cancer J Sci Am. 
1996 Mar;2(2):63). The first gene therapy trial was carried out by Blaese et aL, (1995), to 
correct a genetic disorder known as adenosine deaminase (ADA) deficiency, which leads 
to severe immunodeficiency. Several cancer gene therapy strategies are being developed, 
which involve eliminating cancer cells by suicide therapy (Oldfield et al. Hum Gene 
Ther. 1993 Feb;4(l):39-69), modification of cancer cells to promote immune responses 
(Lotze et al. Hum Gene Ther. 1994 Jan;5(l):41-55), and reversion by delivery of a tumor 
suppressor gene (Roth et aL, Hum Gene Ther. 1996 May l;7(7):861-74). Another 
successful gene therapy trial has been conducted to combat graft-versus-host disease, 
which can result following transplant procedures such as bone marrow transplants 
(Bonini et aL, Science. 1997 Jun 13;276(5319):1719-.24). This procedure was carried out 
using an HSV-based vector. Several gene therapy treatments are imder investigation for 
the treatment of HIV-1 mfection. Most treatments involve modification of lymphocytes, 
ex ydvo, to suppress the expression of viral genes, by means of ribozymes, antisense RNA, 
mutant trans-domhiant regulatory proteins and modification to eUcit a host immune 
response (Nabel et al, Cardiovasc Res. 1994 Apr;28(4):445-55; Galpin et aL, Hum Gene 
Ther. 1994 Aug;5(8):997-1017; Morgan RA, Walker R. Hum Gene Ther 1996 Jun 
20;7(10):1281-306 Gene therapy for AIDS using retroviral mediated gene transfer to 
deliver HTV-l antisense TAR and transdominant Rev protein genes to syngeneic 
lymphocytes in HIV-1 infected identical twins; Wong-Staal et al. Hum Gene Ther. 1998 
Nov l;9(16):2407-25). Vectors currently in use for gene therapy treatments and animal 
tests include those derived from Moloney murine leukemia viiiis, such as MFG and 
derivative thereof, and the MSCV retroviral expression system (Clontech, Palo Alto, 
California). Many other vectors are also commercially available. 

Viral vectors are especially important in applications when a specific tissue type is 
to be targeted, such as for gene therapy applications. There are two available methods for 
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targeting genes to specific cell or tissue type. One strategy is designed to control 
expression of the required gene using a tissue specific promoter (discussed above), and 
another strategy is to control viral entry into cells. Viruses tend to enter specific cell types 
according to the envelope proteins that they express. However, by engineering the 
5 envelope proteins to express specific proteins as fiisions, such as erythropoietin, insulin- 
like growth factor I and single chain variable firagment antibodies, viral vectors can be 
targeted to specific cell-types (Kasahara etal. Science. 1994 Nov 25;266(5 189): 1373-6; 
Somia et aL, Proc Natl Acad Sci USA, 1995 Aug l;92(16):7570-4; Jiang et al, J Virol. 
1998 Dec;72(12):10148-56; Chadwick et al, J Mol Biol. 1999 Jan 15;285(2):485-94). 

10 In one example of tissue specific targeting in transgenic mice, a novel transgene 

deUvery system has been developed in which the target tissue type expresses an avian 
viral receptor (TV A), under the control of a tissue specific promoter. Transgenic mice 
expressing the TVA receptor are then infected with avian leukosis virus, carrying the 
transgene(s) of interest (Fisher, G, H. et aL, Oncogene 18: 5253-5260 (1999). 

15 h. Construction of Zinc Finger libraries 

Zinc finger libraries may be constructed firom naturally-occurring human zinc 
finger modules. Thus, the invention provides libraries of zinc fmger modules. Module 
libraries according to the invention may be assembled combinatorially into zinc finger 
polypeptides. The combinatorial assembly may be carried out biologically, using random 

20 assembly and selection technologies, or in a directed maimer under computer control, 

assembling desired modules to produce zinc fingers having defined or random specificity. 
In accordance with the invention, libraries may be constmcted entirely firom natural zinc 
finger polypeptide modules firom which zinc finger polypeptides having any desired 
specificity may be isolated. The invention, in its most preferred eispect, does not require 

25 the engineering of the specificity of any zinc finger module in order to produce a zinc 
finger polypeptide having specificity for any desired nucleic acid sequence. 

Selection of appropriate zinc finger modules for assembly into libraries of 
composite binding polypeptides having a predetermined binding specificity can be 
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accomplished by applying the rules for zinc finger binding specificity set forth herein. In 
the case of zinc finger assembly under computer control, a rule table may be used to 
select zinc fingers for binding to the target site. Figure 1 shows a flowchart depicting part 
of the logic used in the selection of zmc fingers firom a natural library in accordance with 
5 the invention. The logic set forth in Figure 1 may be supplemented, for example using 
Rules relating to zinc finger overlap. Functional testing of zinc fingers for binding to the 
desired binding site may be implemented in an automated fashion and integrated with the 
zinc finger design system. 

The invention thus provides libraries of zinc finger modules. Li one embodiment, 
10 the modules are human zinc finger modules. Preferably, the modules are DNA-bindmg 
zinc finger modules. 

In a preferred aspect the invention provides a library of DNA-binding human zinc 
finger modules as set out in Example 1 below. Moreover, the invention provides a Ubrary 
of human zinc finger modules as set forth in Example 2 below. Sub-libraries can be 
1 5 prepared from either of the libraries of the invention. 

The invention fiirthermore encompasses Ubraries in which zinc finger modules as 
set forth in Examples 1 or 2 herein are combined with other zinc finger modules to 
provide further libraries that may be used to generate zinc finger polypeptides. 

In a still fiu1:her aspect, the invention relates to libraries derived from animals 
20 other than humans, for use in said organisms in order to derive some or all of the same 
advantages as may be obtained with human zinc fmgers for use in humans. Example 3 
sets forth databases of zmc fingers fi-om mouse, chicken and plants. Sequences of zinc 
fingers can be identified in other organisms by the same means, z.e. by analysis of 
sequence information and identification of zinc fingers in accordance with the guidance 
25 given herein. 
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EXAMPLES 



Example 1. List of selected human DNA-binding zinc fingers. 

These fingers have been selected from the human genome on the bsisis of a prediction that 
5 they have a DNA-binding potential. This prediction is based on coded contacts (WO 
96/06166, WO 98/53057, WO 98/53058; WO 98/53059 and WO 98/53060); 
accordingly, for each peptide unit, a 3 -nucleotide DNA target subsite is shown, as tlie 
preferred sequence to which the zinc finger binds. Hence, by constructing 2- or 3-finger 
libraries firom these 200 or so units, in the manner described in the Examples infi-a, there 
10 exists the potential to screen a large variety of novel DNA target sites. Note that the 

predicted DNA target subsites listed below are merely intended to be a guide to the DNA- 
binding potential. It is anticipated that in practice, an even wider range of DNA 
sequences can be targeted using a Ubrary engineered fi*om this database, through the 
exertion of a positive selection pressure in the Ubrary screening system. 

15 

The fingers listed below are in a format that can be linked with classical wild-type 
canonical "TGEKP" (SEQ ID NO:3) linkers (i.e. . . .TGEKP - zinc finger peptide 
sequence - TGEKP — zinc finger peptide sequence — TGEKP - etc. . For each peptide 
sequence, an oligonucleotide is designed to encode the peptide sequence; the 
20 oligonucleotide can tiien be linked into a library selection system, as described in the 
Examples infra. 

Database of predicted human DNA-binding zinc fingers 

25 227 finger units 



Zinc finger 


DNA site 


SEQ ID 


Peptide sequence 






NO 










ZIF268 Fl 


GCG 


31 


YACPVESCDRRFSRSDELTRHIRIH 


Z1F268 F2 


TGG 


32 


FQCRI CMRNFSRSDHLSTHIRTH 


ZIF268 F3 


GCG 


33 


FACDI CGRKFARSDERKRHTKIH 


Kr-likel3 


NGT 


34 


HKCHYAGCEKVYGKS SHLKAHLRTH 


MAZ Fl 


AGO 


35 


YQCPVCQQRFKRKDRMS YHVRSH 
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MAZ F2 


roG 


36 


YNC SHCGKS F SRPDHLNSHVRQVH 


MAZ F3 ] 


STGT 


37 


FKCEKCEAAFATKDRLiRAHTVRH 


TIEG2 (SPl) F3 ( 


3GG 


38 


FVCPVCDRRFMRSDHLTKHARRH 


SPl Fl ( 


3GG 


39 


HKCHYAGCEKVYGKS SHLKAHLRTH 


SPl F2 ( 


3CG 


40 


FAC S WQDCNKKFARSDELARHYRTH 


SPl F3 


SGG 


41 


F S CP I CEKRFMRSDHLiTKHARRH 


WTl Fl 


rGT 


42 


FMCAYPGCNKRYFKLSHLQMHSRKH 


WTl F2 


GAG 


43 


YQCDFKDCERRFSRSDQLKRHQRRH 


WTl F3 


TGG 


44 


FQCKTCQRKFSRSDHLKTHTRTH 


WTl F4 


GCG 


45 


F S CRWP S CQKKFARSDELVRHHNMH 


TYYl 


TAT 


46 


FQCTFEGCGKRFSLDFNLRTHVRIH 


TYYl 


NAA 


47 


YVCPFDGCNKKFAQSTNLKSHILTH 


TF3A 


GGG 


48 


FVCD YEGCGKAF I RD YHLSRH I LTH 


TF3A 


GGC 


49 


FKCTQEGCGKHFASPSKLKRHAKAH 


MAZ 


GGC 


50 


HACEMCGKAFRDVYHIiNRHKLSH 


GLIl 


GCA 


51 


YMCEHEGCSKAFSNASDRAKHQNRTH 


ZIC3 


GCA 


52 


FKCEFEGCDRRFANSSDRKKHMHVH 


SP4 


NGG 


53 


HI CHI EGCGKVYGKTSHLRAHLRWH 


SP2 


NTG 


54 


HVCHI PDCGKTFRKTSIiLRAHVRLH 


BTEl 


NGG 


55 


HKCP YSGCGKVYGKS SHLKAHYRVH 


GLI2 


TAG 


56 


HKCTFEGCSKAYSRLENLKTHLRSH 


Q14872 


TAT 


57 


YQCTFEGCPRTYSTAGNLRTHQKTH 


Q14872 


TGC 


58 


FRCDHDGCGKAFAASHHLKTHVRTH 


ZIC3 


TAG 


59 


FPCPFPGCGKIFARSENLKIHKRTH 


Z143 


CTT 


60 


FKCPFEGCGRSFTTSNIRKVHVRTH 


Z143 


CGT 


61 


FRCEYDGCGKLYTTAHHLKVHERSH 


000153 


AAT 


62 


FMCHESGCGKQFTTAGNLKNHRRIH 


Z143 


AAC 


63 


YYCTEPGCGRAFASATNYKNHVRIH 


Q14872 


TCT 


64 


FVCNQEGCGKAFLTSHSLRIHVRVH 


000153 


TGT 


65 


Fl CPAEGCGKSFYVLQRLKVHMRTH 


Q14872 


GCT • 


66 


FNCESEGCSKYFTTLSDLRKHIRTH 


Z143 


GCT 


67 


YRCSEDNCTKSFKTSGDLQKHIRTH 


BTEl 


GCG 


68 


F P CTWPDCLKKF SRSDEIjTRHYRTH 


015391 


TAA 


69 


r VC^irr JJ VL-JNlJ<ixrraSJo XJ>i±ji\.±xiiiJXxi 


Z143 


GNC 


70 


YVCTVPGCDKRFTE YS SLYKHHWH 


043591 


GGT 


71 


HVCEHCNAAFRTNYHLQRHVF IH 


BCL6 


TAG 


72 


yrcnicgaqfnrpanlkthtrih 


075626 


TAG 


73 


HECQVCHKRFS STSMLKTHLRLH 


075626 


YAA 


74 


yecnvcaktfgqlsnlkvhlrvh 


BCL6 


NGA 


75 


YKCETCGARFVQVAHIjRAHVLIH 
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\J / D O Z D 






r jVL-I^ 1 V^J>]iS.L:jr x ^JXii^JrlXiv^JNJrl X Xj Vri 






•7 n 


X V v_vjxvt\.P rCk^xvo X iJ\,,;i-4-Lls^lv V n 




JL X 1 


/ 0 


X irL-iijXL.^ lK.r KXlXiv^ XijJxorlljrCXrl 


vjr J. X 




/ y 


X ±rv-,Vi2x^^Jt>-Jtv.r rl^ss^JVoX^lYliNJxrll r Xrl 






0 u 




ZN75 


TAY 


81 


YRC S WCGKS F S HN xNLiHTHQR x H 


Z18 6 


111 ^xxx; 


o2 


Y KCXciL-VjKXr 1 VWyXilj iXixlxlKlrl 


Z13 6 


111 VJ-J-i) 


83 


t KCKQCCjKAr bCbP ILiKXHbiKlH 


Z13 6 


ICuA 




Y JvC K V CCji^A-i* iJ Y P 0 Ki* K 1 HhiKbH 


Z13 0 


J. 1 1 ( Y X Y ; 


0 b 


YKCJ\.VCCjK.Fr Hbljbbr y VrliiKXri 


ZtLl / 


n " I"* TV 

J. 1 A 


0 ez 
00 


Y C iSJi L. JxAr KJNI b b uXjK V Ji V K 1 H 


Z13 0 


TNN 




17 Tp/^T^p r^r^inK t?p q q o q tjp t ■lttt'td ttlt 
r iliV^jSXvl^vjTJ\-B.r Koboor KXirliiKlxl 




A/ 1 - X 1 


0 0 


xKCJMhiCCjKUr 1 b 1 bKijiMK±LKi IH 


ZN42 


TY 1 


8y 


YHuCjJciCOXrijr Xy vbKXjXxliXiyKXH 


ZN4 2 


CGG 


y 0 


t VCGDCGyGr' VRbARXjEEHRRVH 


014 313 


TCG 


91 


xKChjKCGKGt FRbSDXiyHHyKlH 


014 913 


C-G/T-G 


92 


Y KC£iEL.GKGi:* SRbbKXiyBjHyX XH 


ZN45 


YYC 


93 


YKCfcjECGKGF CRASNljLiDHyRGH 


ZN4 5 


7\ 7\ 7\ 

AAA 


94 


YKCtiECCjKCul:' SyAbNliLiAxiyRGH 


ZN4 5 


NAG 


95 


-TT-z-N /-^T7ITT«/^/^ T;^/^ TP r*^T~J A C'KTTT'T 7\ Tm/~^"^7TT 

XyChiECGKGr CRAbJMr XiAHRCjVH 


Z23 9 


YYG 


95 


Y JxL,iiiyCvjJ\.vjr IKbbbXjXiXxiyAVJdl 


vjy4 0 


X Isl X 


y / 


xKL-olljv^VjiJ\.L7r X VJNbLjXiFlXixiyKlrl 




TV 7\ V 


0 Q 

y 0 


xyv^-fiiiv^Vjrrv.Lji:' 0 VLjoyXiy/^xiyKv^rl 


Z1M4 b 


JMCjX 


y y 


X ivL-ii r!jL.ljj\.Cjx:* b VvjbrlXiy/Vriy 1 brl 


ZJN4 b 


YCva 


1 U U 


YyUUAUtjri\.kjr bKbbLJr JNIXrir RVil 


^IM4 O 




J_U JL 


X Jx^Lj 1 L-*JrJ\.vjf oKooX'lJiN Vrtv-KXri 




rn/-i TV 

1 vjA. 




X JN.v^lN/^.k^VzrJN.or' 0 X ooJtlXiJ^ XxlL^rCXxl 


Zzi 3 y 


1 CJA 


1 U3 


X y Y iliCvjJvQjjl? by b bX/XiKX xlXiK V 1:1 


3 y 


xAA 


1 U4 


X JN.CvjiiiL.vjJxvjr' byoolNiXijlXrlKCXrl 


•7 0 Q 


X LjA 


1 Ub 


X JVv^X'JxU LyivLjx:* 0 y 0 b iNXjrl X 11 yK Vxl 


0 y 




X U D 


Xrlv^vji\.^LTJ\.Vjrr* iDyooJSXjljXriyK Vrl 


r^^r n "7 £C 
vJb U / 0 3 


A V*A 


XU / 


r JN.L-oii<wLjK_f\J:' £Dy oi^oXiXyxllliKXJbl 




UrX X 


X U 0 


X JiCJXiiV^vjiNJ-ii:* Xivooc?Xj:/-iJNJUiizlXi:l 




JLx-i. 




X IT v_, iVXj vjji\jC-vjr *~j X xx£-xv^xi'^r\.i*iXi 


043296 


AYY 


110 


YKCS ECGKAFSRS S SLTQHQRMH 


Z134 


ATG 


111 


YKCSECGKAFSRKDTLVQHQRIH 


Z134 


ATG 


112 


YECSECGKAPSRKATLVQHQRIH 


ZN84 


AYC 


113 


YECSECGKAFSEKLSLTNHQRIH 


Z191 


AYG 


114 


YGCVECGKAFSRS S I LVQHQRVH 


ZN24 


ACG 


115 


YGCVECGKAFSRS S I LVQHQRVH 
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043338 < 


3TA 


116 


YVCGQCGKS F S QRATL I KHHK V il 


043339 < 


3TA 


117 


YE C S Q CGKS F S QKA.TLt VKHQR VH 


043338 


AYA 


118 


YDCGQ CGKS F I QKS S L I QHQ v VH 


043339 




119 


YE CGQCGKS F S QKS GL I QHQWH 


043338 


CAA 


120 


YECGECGKSFSQSSNLIEHCRIH 


Q13398 


?^AA 


121 


YECGECGKSFSQRSNLMQHRRVH 


Z135 


CYA 


122 


T !■ ^^^^^ mT T fill 1 T T T 

YECGECGKAF SQ STLLTEHRRI H 


Q13398 


ACA 


123 


YECSECGKSFSQSSSLIQHRRVH 


014709 


AAA 


124 


YKCNECGKAF S QS A YLLNHQR I H 


014709 


CAA 


125 


YKCNECGKVFSQNAYLIDHQRLH 


014709 


CAA 


126 


YKCTECGKAFTQSAYLFDHQRLH 


014709 


CAA 


127 


YKCDECGKTFAQTTYLIDHQRLH 


O60792 


AAA 


128 


YNCNECRKTFSQSTYLIQHQRIH 


015535 


ANA 


129 


YHCKECGKVFSQS AGL I QHQRI H 


Q15776 


(a) 


TNA 


130 


YHGKECGKAFSQNTGLIIiHQRIH 


Q15776 


(b) 


TNA 


131 


YQCNQCGKAF S QS AGL I LHQRI H 


Q15776 


CNA 


132 


YKCNE CGRAF S QKS GL I EHQR I H 


ZN84 


AAC 


133 


YGCNE CGRAF S EKSNL I NHQR I H 


Z191 


ANA 


134 


YKCLECGKAFSQNSGLINHQRIH 


ZN24 


ANA 


135 


YKCLECGKAFSQNSGLINHQRIH 


060765 


AYA 


136 


YRCE E C G I S FGQ S SAL I QHRR I H 


ZN07 


YYA 


137 


YRCEECGKAFGQSSSLIHHQRIH 


043340 


ACA 


138 


YECDECGKSYSQSSALLQHRRVH 


Z135 


CYY 


139 


YKCQECGKAF S H S SAL I EHHRTH 


043340 


AYA 


140 


YDC SE CGKS FRQVS VL I QHQRYH 


043340 


AYA 


141 


Y VC S E CGKS FGQKS VL I QHQRVH 


Q13398 


AYT 


142 


YQCSQCGKS FGCKS VL I QHQRVH 


015535 


GNA 


143 


HKCDE CGKS FTQS S GL I RHQR I H 


Q15776 


GNA 


144 


HKCDE CGKS FAQS SGLVRHWRI H 


075802 


ANG 


145 


HKCEECGKAF SRS SGL I QHQR I H 


Z189 


ANG 


146 


HKCEECGKAFSRSSGLIQHQRIH 


075802 


ANG 


147 


HKCDE CGKAFSRNSGL I QHQRI H 


Q13398 


YYG 


148 


HECNECGKSFSRSSSLIHHRRLH 


Z195 


YAA 


1 /I Q 




043309 


CYA 


150 


YKCDKCGKAFTQRS VLTEHQR I H 


Z195 


CGA 


151 


YKCDECGKZ^YTQSSHLSEHRRIH 


ZN45 


YYA 


152 


YKCERCGKAFSQFSSLQVHQRVH 


060893 


YYN 


153 


YECEDCGKTFIGSSALVIHQRVH 


ZN07 


TAT 


154 


YECLQCGKAFSMSTQLTIHQRVH 


060893 


CYA 


155 


YECDDCGKTFSQSCSLLEHHKIH 
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O / / D 






xiiL.UJiUorv.1 r KK.oon.ljXL7riyKorl 


ZiiM O ft 




±b / 


xiiL-GJiL-Gi\Ar oKJxorlXiX ibHWKXxl 


^J. / / 




X D O 


\riC»/~'rMJ<^/^X(^C?'CO/^C! OUT ■KTXTTUT^TDT'XJ 

xi!iCL)rlUGi\.or oyoorlXjJM vHKKl jl 


OA "XO CXC 


Ax 


Xoy 


X liv-jyiiiL-VjAAr iMK.J\.o x Xj X yxiyKXri 


Oi4 O O Q 




T /- rv 
Xo U 


xiliC VhiCGKAr IKMoGXjIKxIJnJ^XIi 


oyi *a "3 /I n 
vJ4 J J 4 U 


TV 


Xo X 


xiliCKiiCGrv.or XKl>J>JJi>XiXynK.x Viri 




TV TV 


Xo^ 


xECbECGKTr oKJsXJJNIXjXyHKKXH 


/I O "3 "5 o 

04 J J 3 o 


CGA 


X53 


YECSECGKSFSQTSHLjNDHRRxH 




AGA 


X64 


YECAQCGKAFSQTSHIjTQHQRIH 


Z13 5 


TV TV 

AGA 


X65 


YEC S E CGKAFRQ S X HLjTQHLR X H 




TV TV 

AGA 


Xdo 


YECHDCGKS F RQ S THXj T QHRR X H 


*7 0 A C 


AGG 


Xd / 


xACiDCGKKr GKJbbHXjXQHyX XH 




TV r^r^ 

AGG 


1 6 o 


YEC xECGKx t X Kb 1 HLiLjQHHMXH 




TV TV /-< 

AAG 


xby 


xIliCKECGKYf bKoANLjXQHQSXH 


O /52 y 0 


TV 

AGG 


X70 


■^rT^/^T^T^l/^/^IT^/^T^TVTXI TV TTX X i^TT/^ T^T TT 

YE CKE CGKGFNRGAHLi X QHQKX H 


0752 90 


AGG 


171 


■^.T'TH /^T^Tn T^/^ TTNTT~J /~t TV T TX "I" T/^ T>^ T T T 

YECKECGKGFNRGAHLIQHQKIH 


050792 


CGA 


172 


■^TTTI/^TVTTn/^l/^T^TV X^ O /^T^ T T T7IR /TT^TT/^ TjT- T T T 

YTCNECGKAFSQRGHFMEHQKIH 


075123 


CGA 


173 


■\y|-n/"'IT \/'>k/''i/^T/^<^ X^r*! r^ f** f*^TTX 1V>IX^TT/^X^ T TT 

YTCDQCGKGFGQSSHLjMEHQRxH 


043 33 7 


/^^T TV 

GYA 


174 


TTTTI^TXTTV ^r^T^TV X^O/^f~« OrriT XT^TTtTX T~ TT 

YE CNACGKAF S Q S S TLj I RH YD X H 


075802 


GY Y 


-1 «-T r- 

175 


"XTT^ rTtrr\T /^f^T^% \ t tl'OT Tg» O mX XXITT/^T^XTT 

YE CNYCGKTF S VS S TLj I RHQR X H 




GGY 


X /6 


■VT'Xn/^ rt TP/^/~<T./^' m .'T^T TO OTTT X T^ TT L.I la X TT 

YECSECGKTFRVSSHLiIRHFRXH 




CYY 


T T7 
X / / 


X V CJMjn C GKGx* bob XjKDHIIiKTH 




TV '\T'\T 

AY Y 


X /o 


xGCNECGKxFbHbbbLibyxiERTH 




GAY 


1 /y 


X D C_,N HCGKb r JNI HK i JNl XjN KHE R X H 


U / blz J 


TV TV TV 

AAA 


1 o rv 
Xo 0 


X vCJ>JECGKi<i:' by X brsli:' lyjiyKXH 


/^T O "O O Q 

y±33 y o 


TV TV XT 

AAY 


X o X 


X VL.GECGK.bl:' bHbbJNIXiKJNIHyRVH 


*71VT'> C 


X X A 


TOO 

Xo^ 


Vrp/-TivTTT»/^riTrTV TPDOTD O OT •T'T TTJ/^TD HtLT 

X lUJNIiiL.Gj\Ar KyKbbXix vriyKXH 


/liX3 / 


X i 


IDT 
X O -5 


X liL- 1 iiCvjjrvl r o Jii\Al JU X XriyKXrl 


OA ^5 Q 
VJft O J o O 


VJX X 


1 QA 
Xo^ 


X xiv^XJJiCoJtvAi? vai5JEs.t> xXj VxCriyKXxl 




1 X ^ 


JL c5 D 


X JCj^^oxIjL-VjxSAJ? kjJzi iVo o XjA X riyrCXxl 


^IM u / 


VjAA 


X o o 


Vr^r'TPTPnr' VZi T7QOOQOT T TP "MOT? TTJ 




X AA 


T Q "7 

Xo / 


xjNCoyL.VjTj\Ar oyivoyj-jX ojn.yKXrl 


Zi J. O O 


xGx 


X O O 


XAV^JJJlL.xljJS^r' oxIJXvdJSXjX VJi.yKXJi. 


OA-j-a -5 Q 

V— /rr ^ ^ o o 




X O -7 


WPHFPfiKA FM'PKfi KT .VP WOR TTT 
X V vjiZj vjivrt. J7 1 ix^ i\.o xvxj V rv-Xi^^rv. x n 


OZF 


YYA 


190 


YECNVCGKAFSQSSSLTVHVRSH 


095779 


YYY 


191 


YKCKECGKAFNHC S LLT I HERTH 


Z135 


GYY 


192 


YACRDCGKAFTHSSSLTKHQRTH 


ZN80 


GYA 


193 


YECKECGKGFYYSYSLTRHTRSH 


Z177 


GYC 


194 


YECSDCGKAFIDQSSLKKHTRSH 


Z177 


GYY 


195 


YDCKECGKAFTVPSSLQKHVRTH 
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043337 2 


\CT 


196 


YDCMACGKAFRC S S EL I QHQR 1 H 


Q14585 1 


\GY 


197 


YECKECEKAFRSGSKXjI QHQRiyiirl 


Q14585 i 


\AY 


198 


YEClDCGKAFGSGSNXiTQHRRIH 


Q14585 ( 


3YY 


199 


YE C KACGMAF S S G S ALTRHQR I H 


Q14585 


z^YY 


200 


YECKECGKAFYSGS SLTQHQRIH 


Q14585 


^AY 


201 


YECKECGKAFGSGANLiAYHQRI H 


Q14585 


3AY 


202 


FECKECGKAFGSGSNLTHHQRIH 


Q14585 


?^CY 


203 


YVCKECGKAFNSGSDLTQHQRIH 


060792 


ACY 


204 


YQCHECGKTFSYGSSLIQHRKIH 


060893 


GNA 


205 


H YCHE CGKS FAQS S GLTKHRRI H 


Z165 


GCC 


206 


YECNECGKS FAE S SDLTRHRRI H 


060893 


GAY 


207 


YECEECGKVFSHS SNL I KHQRTH 


Q15776 


NGY 


208 


YECNECGKAFSHSSHLIGHQRIH 


Z135 


GYY 


209 


YQCGECGKAFSHSSSLTKHQRIH 


Z165 


GGY 


210 


HQCNECGKAFRHSSKLARHQRIH 


Z135 


TYG 


211 


YECHECLKGFRNSSALTKHQRIH 


043361 


YGC 


212 


YECNECGKFFLDSYKLVIHQRIH 


043361 


YGC 


213 


YECSECGKFFRDSYKLIIHQRVH 


Z140 


YYG 


214 


YGCHECGKTFGRRFSLVLHQRTH 


060792 


AAA 


215 


YECNECGKAFSQHSNLTQHQKTH 


Z135 


ANA 


216 


YKCTQCGRTFNQIAPLIQHQRTH 


Z135 


ANA 


217 


YECNQCGRAFSQIiAPIilQHQRIH 


Z135 


ANA 


218 


YE CHECGKAFTQ I T PL I QHQRTH 


043309 


AGA 


219 


YKCNECGKAFGRWSALNQHQRLH 


ZN83 


AGA 


220 


YKCNECGKVFHNMSHLAQHRRIH 


ZN83 


AGY 


221 


YRCNVCGKVFHHI SHLAQHQRIH 


ZN83 


AGA 


222 


YKCNECGKVFNQI SHLAQHQRIH 


014709 


CAY 


223 


FECSECGRAFSSNRNLIEHKRIH 


ZN74 


GYA 


224 


YKCSECGRAFSQNHCLIKHQKIH 


Q13398 


ANA 


225 


YECSECGKSFSQNFSLIYHQRVH 


075123 


GYA 


226 


FECKECGKGFSQSSLLIRHQRIH 


Z132 (a) 


GGA 


227 


FECSECGRDFSQSSHLLRHQKvH 


Z132 


GYA 


228 


YECNECGKFF S QNS I L I KHQKVH 


Z132 (b) 


GGA 


O O Q 




Z132 


GGN 


230 


YECSECGRAFS SNSHLVRHQRVH 


Z132 


AAA 


231 


YECSECGRAFNNNSNLAQHQKVH 


Z134 


ATY 


232 


YKCSDCGKVFRHKSTLVQHES IH 


075290 


AAT 


233 


YECKECGKAFRLYLQLSQHQKTH 


Z157 


AYC 


234 


YECGECGK3SFFRAKKSLNQHQRIH 


Z157 


TTT 


235 


YECGECGKFFRMKMTLNNHQRTH 
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ZjJNIU / 


7\ TV rp 


o £r 




Zlb / 


Ax i 


O Q 






LjCjY 


O O Q 

zi j5 o 


\7-T^/^Ts,TT^i^r^i<^irTipi\/iA7TvTC x<^T TC>tTrMr\7xr 
X iiL.iM ixC-vjivr r rl x iM o x\J_i x Kriy JtvVil 


0433 61 


GTY 


23 9 


YKCSKCCjKr r KxKL. i JjoKriyjxVll 


Z157 


CGY 


240 


YECNECGNAF YVKARIj I &HQRMti 


Z157 


CGY 


241 


YE USE CGNAF YVKVkIj 1 BHQK I H 


075123 


AGG 


242 


FECNE CGKAF I RS S KLi I Q HQRl H 


ZN07 


AGT 


243 


FKCTE CGKAFRLS S KIj I QHQR I H 


075123 


GYT 


244 


YECNECGKAF FLiSb x LiIRHQKlH 


0758 02 


AAT 


24 5 


HKCGECGKAt* RLib 1 x IjIUH^KIH 


Z174 


GCG 


RNA 


246 


YKCDDCGKbr ^WNbEDJ^i<Hi^J^.VH 


Z2 02 


GCG 


RNA 


247 


YRCDDCGKHFRWT SDIj VRHQRTH 


043345 


GTG 


RNA 


248 


YKCEECGKAYKWPSTLjSYHKKIH 


043345 


CA? 


RNA 


249 


YKCEECGKAFNWS SNLiMEHKKIH 


075346 


TAA 


250 


YRCEECGKAFNQSANLjTTHKRIH 


ryKT A *!> 
ZN4 J 


TAA 


OCT 
^ O 1 


X JN.L.JciiliULjJ\/ir lyooiNJJLjl XriJtvxvXrl 


ZN85 


GGA 


252 


YKCEECGKAFNQS SKLTKHKKIH 


ZN85 


GAA 


253 


YTCEECGKAFNQSSNLTKHKRIH 


Q02313 


GAA 


254 


YKCEECGKAFNQLSNLTRHKVIH 


Q02313 


CAA 


255 


YKCEECGKAFKQFSNLTDHKKI H 


Z141 


GTG 


256 


YKCEECGKAFNRSTTIjTKHKRIH 


ZN91 


TTG 


257 


YKCEECGKAFSRSSTLTKHKTIH 
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Example 2: List of all human C2H2 zinc fingers 

This list represents an even more comprehensive database of human zinc fingers, 
including those with non-DNA-binding activities such as those mediating protein-protein 
interactions and those involved in RNA binding. By including fingers fi:om this database 
5 into a natural finger selection system as disclosed herein, many new zinc finger proteins 
having unique target specificities can be obtained. All of these peptides would 
necessarily possess properties required for potential therapeutic agents, such as non- 
immunogenicity. 

10 The fingers listed below are in a format that can be linked with classical canonical 

*'TGEKP" linkers (i.e. . . .TGEKP - zinc finger peptide sequence - TGEKP - zinc finger 
peptide sequence - TGEKP - etc. . .). For each peptide sequence, an oligonucleotide is 
designed to encode the peptide sequence; the oligonucleotide can then be linked into a 
library selection system, as described iq the Examples iitfra. 

15 



Human zinc finger database 




968 finger units 






Name 


SEQ ID NO 


Peptide sequence 


Q92 9 8 INHUMAN 


258 


HQCAHCEKTFNRKDHLKNHFQTH 


O76019_HUiyiAN 


259 


HQCAHCEKTFNRKDHLKNHLQTH 


ZFY_HUMAN 


260 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFX_HUMAN 


261 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFX_BOVIN 


262 


HRCEYCKKGFRRPSEKNQHIMRH 


Q15558_HUMAN 


263 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFX_HUMAN 


264 


HKCDMCDKGFHRPSELKKHVAAH 


ZFY_HUMAN 


265 


HKCEMCEKGFHRP S ELKKHVAVH 


Q15558_HUMAN 


266 


HKCEMCEKGFHRPSELKKHVAVH 


Zl 6 INHUMAN 


267 


YTCSVCGKGFSRPDHLSCHVKHVH 


MAZ_HUMAN 


268 


YNCSHCGKSFSRPDHLNSHVRQVH 


043829_HUMAN 


269 


YSCEVCGKSFIRAPDLKKHERVH 


O00403_HXJMAN 


270 


YSCEVCGKSFIRAPDLKKHERVH 


Z151_HUMAN 


271 


HKCPHCDKKFNQVGNLKAHLKIH 


Q92618_HUiyiAISf 


272 


YKCPYCDHRASQKGNLKIHIRSH 


ZFX_HUMAN 


273 


FRCKRCRKGFRQQSELKKHMKTH 


Q14526_HUMAN 


274 


YPCT I CGKKFTQRGTMTRHMRSH 


HKR3_HUMAN 


275 


FECTECGYKFTRQAHLRRHMEIH 


Q14526_HUMAN 


276 


YACDACGMRFTRQYRLTEHMRIH 


07562 6_HUMAN 


277 


YECNVCAKTFGQLSMIiKVHLRVH 


CTCF__HUiyiAJs[ 


278 


HKCPDCDMAFVTSGELVRHRRYKH 


075701 HUMAN 


279 


YSCPDCSLRFAYTSLLAIHRRIH 
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075701 HUMAN 


280 


YACSDCKSRFTYPYLLAIHQRKH 


043167 HUMAN 


281 


YACKDCGKVFKYNHFIiAIHQRSH 


075850 HUMAN 


282 


CACPDCGRSFTQRAHMLLHQRSH 


075850 HUMAN 


283 


YACPDCGRGFSHGQHIiARHPRVH 


ZN4 2 HUMAN 


284 


FVCGDCGQGFVRSARLEEHRRVH 


075467 HUMAN 


285 


FRCVDCGKAFAKGAVLLSHRRI H 


Ol 5 0 1 5~^HUMAN 


286 


YKCSECGRAYRHRGSLiVNHRHSH 


07^7 01 HUMAN 


287 


Y P C PDCGRRFRORG S LiAI HRRAH 

^ X X Xx' V,-' V_J X^X^ X X^^^X V>w> I » I X.*-^ X XX wX^ \_X^«X^ X 


092 9^1 HUMAN 


288 


YECAI CORSFRNOSNLAVHRRVH 

X VoX^^ X^^^XVIk^ X XVXN k>/X>i l-l«X V X XX ^X^ V X X 


RPTif^; HUMAN 


289 


YKCDRCOAS FRYKGNIiASHKTVH 

X XVX»aX^X^ >^^^XXk>/ X X V ^ X X ^ 1 \ I X *^ X X X ^ wi- V XX 


TIMA p HUMAN 


290 


YACODCGRRFHOSTKLIOHORVH 

X X^XV^ X-^ ' X\.X V.X X X^^ •X X w A J ^^X X. %• V X J. 


075701 HUMAN 


291 


YPCPDCGRRFTYSSLLLiSHRRIH 


O75701 HUMAN 


292 


HVCTDCGRRFTYPSLIjVSHRRMH 


O757 01 HUMAN 


293 


HSCPDCGRNFSYPSLLASHQRVH 


ZN4 9 HUMAN 


2 94 


YACVECGERFGRRSVLIjQHRRVH 


0439 98 HUMAN 


295 


YGCGVCGKKFKMKHHL VGHMKI H 


01 5? 0 9 HUMAN 


296 


YDCPVCNKKFKMKHHLTEHMKTH 

•X J ^ X* V X^ XN X^X ^X> JU itaX X X wJU X<X I—I J- «l 1 X AX X <1> ^ >L X JL 


04 3 R 2 9 HUMAN 


297 


YACHMCDKAFKHKSHLKDHERRH 


000403 HUMAN 


298 


YACHMCDKAFKHKSHLKDHERRH 


OfiO'^l R HITMAN 


299 


HOCO I CKKAFKHKHHL I EHSRIjH 

XX'^N_»\^^ V-^XVXxJ^X X^X XXVXXX XXJ X. O^XXIi^X^XaJXX 


ni2994 PTTTMAKT 


3 0 0 


HECG T CKKAFKHKHHliI EHMRIjH 

X XXJ \^ JL X\.X>XxX XXX XX VXXX A,X-J .JU .i^X XX XXVaLJXX 


NTTj2 human 


3 01 


HECG I CKKAFKHKHHL I EHMRLH 


012924 HUMAN 


302 


FKCTECGKAFKYKHHLKEHLRIH 


060315 HUMAN 


303 


FKCTECGKAFKYKHHLKEHLRIH 


MTT.o HTTMAN 


304 


FKCTECGKAFKYKHHLiKEHLRIH 

X X \-V^^ X. \ * X VX XX X V «X XVXXXX.I tXX. I IX XJ^IX\-,X* XX 


0957 8 0 HUMAN 


305 


YKCEECGKAFKRCSHIjNEHKRVO 

Xi X\*V*^ X-J ' ' x«f» X^^^XX X^XX^^W^^XXI IX V ■ ixxxvxy. V 


095779 HUMAN 


306 


YKCEECGKAFKRCSHIiNEHKRVQ 


04'^2 9f^ HTTMAN 


3 07 


F KC S E CGKVFNKKHIjLiAGHE KI h 

X X^^w V_i« V.^ X V V X XM X VX VX X ■! 1 i 1 * X^— ' X X 1 J XX.*1« X X 


O1470 9 HUMAN 


308 


YKCKECGKGFYRHSGIiI ihlrrh 


O14709 HITMAN 


309 


HKCKECGKGF I ORS SliliMHLRNH 

X XXXp^i^ X^ ImI V^^J X^V** X. ■Xa X ^ fct^ III 11 IX XX X ^UJX^X >l X X 


ZTCTftO HTTMAN 

^xM O ^ XI U l^lnXN 


310 


CKCVE CGKVFNRRSHIjIjC YRO I h 

X^^H* V X^ V X XV X ^X^ XXX 11 aX X^^^ aJL X X 


043*^37 HUMAN 


311 


YKC I ECGKAFKRRSHLLOHORVH 

J. X VVi^ «X XpJ V— • X XX XX- X VX ^X V X XX^til 4^|^XX ^^X \. V X X 


060765 HUMAN 

\^ \j V / vj —J xx^^'t' jjr*x>i 


312 


YICKECGKAFTLSTSLYKHLRTH 


Z136 HUMAN 


• 313 


FECKRCGKAFRSS S SFRLHERTH 


Z136 HUMAN 


314 


FVCKQCGKAFRSASTFQIHERTH 


Z136 HUMAN 


315 


YVCKHCGKAFVSSTSIRIHERTH 


Z136 HUMAN 


316 


FKCKQCGKAFSCSPTIiRIHERTH 


Z124 HUMMJ 

XJ ^« ^X X X \b/ X XXj liN 


317 


YVCNNCGKGFRC S S SIiRDHERTH 


Z177 HUMAN 


318 


YECKECGKAFRNSSCLRVHVRTH 


Z124 HUMAN 

XJ «Xb 4^ X X \J X ^X^X^>| 


319 


YECKHCGKAFRYSNCLHYHERTH 


O95780_HUMZ^ 


320 


YKCKECGKAFNHCSLIiTIHERTH 


095779_HUMAN 


321 


YKCKECGKAFNHC S LLT I HERTH 


Z124_HUMAN 


322 


YPCKQCGKAFRYASSLQKHEKTH 


Z136_HUMAN 


323 


YECKQCGKAFSYIiNSFRTHEMIH 


Z13 6_HUMAN 


324 


YECKQCGKAFSYLPSLRLHERIH 


O1506 0_HUMAN 


325 


YSCKVCGKRFAHTSEFNYHRRIH 


Z136_HUMAN 


326 


YKCKVCGKPFHSLSPFRIHERTH 


Z136 HUMAN 


327 


YKCKVCGKPFHSLSSFQVHERIH 
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Zi X O O rlUlVLf^NI 


32 8 


YKCKVCGKAFDYPSRFRTHERSH 






YVCNECGKAFTCSSYLLIHQRIH 




•J J V 


YNCKECGKSFRWSSYLLIHQRIH 




^ ^ -1- 


YRCDQCGKAFSQKGSLIVHIRVH 




j> j> ^ 


YO CKE CGKS F S ORGS LAVHERLiH 




o o o 


YECQECGKSFRQKGSLTLHERIH 




o o t 


YECNECGKAFSQRTSLIVHVRIH 


r7 XT' TTT TTV/T TV "NT 

OZF HUMAJNI 


R 

O ^ 3 


YF nWCGKA F S O S S S LTVHVRSH 


ZNO / ^ilUiyiAJNI 


-D O D 


YVCNDCGKAFSOSSSLIYHQRIH 


Zlb 1 ^HUMAJNI 




COCVMCGKAFTOAS SLI AHVRQH 


Z17 / ^HUMAJNI 


ft 

3 3 O , 


YDCKECGKAFTVPS SLQKHVRTH 


OZF HUMAJM 




FE CKDCGK2\F I QKSNLI RHQRTH 


^7 n XJT TlV/r 7\ TvT 

Zl / / rlUlVlAiN) 


3 ^ U 


YECSDCGKAFIDQSSIiKKHTRSH 


r7 n ^ T TT TIV A TV "KT 

Zl 7 7 HUMAJM 


3 X 


YECSDCGKAF I FQS SLKKHMRSH 


06 0 7 y2 ^HUMAJN 


A9 


YPr KE CGKAF I RS S SIjAKHERI H 


Z161_HUJV1AN 


3rt 3 


Y A r T YP S KAFRD S YHLRRHE S CH 


Z161 HUMAJN 


A A 


HACEMCGKAFRDVYHLNRHKLSH 


MAZ HUMAJM 


A 

3t: 3 


HACEMCGKAFRDVYHLiNRHKLSH 


/~\ r\ '~l TTT TTV A TV TO" 

O60/y2 HUJyiAiM 


A 
3 O 


FKCDECDKTFTRSTHLTQHQKIH 


0507^72 HUMAJM 


'I A*? 

3t: / 


YKCNECDKAFSRSTHIiTEHONTH 


Z263_HUMAJn 


A ft 
3^ O 


YKrNPrGKSFROGMHLTRHORTH 


Z2 63 HUMAJNi 


A Q 


K KC TjF r GKC F S ONTHLTRHORTH 


Z13 5 HUMAJN 


n 

3 3 U 


YE C S O CGKAFRO S THLTQHQRI H 


rr "1 '1 T TT TlV/T 7\ ^T 

Z13 5 HUMAN 


3 D J. 


YFC!HDC!GKSFRO STHLTOHRRI H 


-1 o tr TJT TTV A TV TvT 

Z 1 3 5_HUMAIM 


3 3^ 


YECSECGKAFRQSIHLTQHLRIH 


/~\ t-f r— yi '~> T TT TIVATV TvT 

075467 HUMAJM 


3 D3 


VFrAOGGKAFSOTSHLTOHORIH 


ZNO 7_HUMAN 


c: A 

3 


YEdTiOrGKAFSMSTOLTIHORVH 


O 9 5 2 7 0 _HUMAjm 


*3 cr IT 
3 DO 


VPPOFrnKRFHOKSDMKKHTYlH 

i JrV^»^£^ V_,VJXVXvi XX%^X\.li_?X_/X JLXN-AVJLX X. X. 


r^T^T"! TTT TIVATV "KT 

GF I INHUMAN 


3 D o 


YPrOYGGKRFHOKSDMKKHTFIH 


r-t I — <~j f~" r> T TT TIVATV "NT 

07 5od0 HUMAJM 


*^ c: "7 

3 3/ 


FPGTECEKRFRKKTHLIRHORIH 


r-\ 1 r~ Cr C O TTT TIVATV TvT 

Q15552 HUMAJM 


"5 ft 
3 D O 


PP PDF.CGMRS I OKYHMERHKRTH 

" Jit V^fi vJX Xx^ ^^^^XV*l«xxx x^j X ^x XX %pX ^ ^ X ^ 


y| n C fS 1 TTT TTVATV 'VT 

04 3 5 91 HUMAJM 


T c; Q 
3 3 i7 


FP PDE CGMRF I OKYHMERHKRTH 


r* l~ IT* TTTT1VA"A ^T 

Q15552 HUMAJM 


"5 ^ n 
3 O U 


FOr ^ OCDMRF I OKYIjLiORHEKI H 


>i ^5 IT 1 TTT TIVATV INT 

04 3 5 91 HUJYLAJM 


3 D X 


FOPSOCDMRFIOKYLIjORHEKIH 


/~\ »~F r~ <"> r~ TTTTAA7\ ^T 

O7 5 8 50 HUMAJM 


"3 O 
3 D Z 


F P r 5^ F en krf s kkahltrhLiRTH 


/— \ »-7 f— f) (Zr T TT TTVA TV TvT 

O7 5o50 HUMAJM 


3 O 3 


YPCAECGKRFSOKIHLiGSHQKTH 


/-\ r> /I O O O T TT TTVA "A "NT 

094892 HUJyLAJM 


3 D *± 


FMCSECGKGFTMKRYLIVHQQIH 


/-^ ii O O O /" TTT TTVATV "NT 

0433 3 6 HUMAJM 


3 O D 


YOCSECGKSFIYKOSLLDHHRIH 


/I o 1 /T TTT TIVA A "KT 

04 316 /^HUMAIM 


3 O D 


FKCNECGKGFAOKHSLQVHTRMH 


04. 1 7 HUMAN 


367 


YTCDQCGKYFSQNRQLKSHYRVH 


PLZF_HUMAN 


368 


YECNGCDKKFSLKHQLETHYRVH 


HKR3_HUMAN 


369 


YACPTCHKKFLSKYYLKVHNRKH 


04333 6_HUMAJSr 


370 


YVCNVCGKSFRHKQTFVGHQQRIH 


043336_HUMAN 


371 . 


YVCNI CGKSFLHKQTLVGHQQRIH 


Z134_HUMAN 


372 


YDCSDCGKSFGHKYTIilKHQRIH 


Z2 0 0__HUMAN 


373 


YDCNHCGKSFNHKTNLNKHERIH 


015361_HUM7^ 


374 


YDCNHCGKSPNHKTNIiNKHERIH 


ZN84 HUMAN 


375 


YDCNHCGKAFSRKSQLVRHQRTH 
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yMA/l PTTMZiM 




J- I—l V XV-L-i V ■ VJIT X N-f^i. O XVlVtJ v_>XJ V X i ii i. X J. i. 




Oil 




7"NrR4. TTTTM^kKr 


7R 


X x\.v^ u. xii\^VJXVn.a7 o^^xxo^^xj j_XMxrx*^x\. XXI 




•3 T Q 


X w^^Od^X\.£\Ai7 O^XN.O'^XJ vl\IXl'>^X\.Xri 


'7TVTnA 'WTTMA'M 
ZiiM O rlUl^LftlM 


p n 

J> o w 


XTVjV^ X>^v_VjrXS_ri.r' O^^I^.onXlX OXx>^l*l X Xl 


^TsTQA ■PTTTMATST 


R 1 

^ O J. 


VMP QOPrtTf A P CI nTCQOT .T QKOP TK 

Xi>! V-^OVj^V-.Vjxvri.X' O^X^O^Xi X OXx^Xv. X XI 


JM O li. UlYlfiJAl 


R O 


X V V_,OxliV^\JjIVr\r' ^^xvonxjx OXl'^Xv xXl 


C ^7 XJTTTV/TA'M 

ZiXZi i xlUlYLAIM 


Q 


x* iiv_.i\JcjL-^ji\.Oi/ v^JXJN-oSK/iJX Ijxi X xc 1 xl 


^iM O 4 Xl UlYLfiiM 


^5 R4 
-5 O 


PTrr* Q Pr*r:JT<rA P C P T^QWT . T PTTDP TW 
x* x!fV^oxZiV_,vjrl\Ax' oxxxvonxix JriT^XxXXi 


•7Tvr Q A T-TT TTV/T A TVT 


R 
-3 O O 


X XliV^ wXIj v^VJj\.M.X^ OXvJXOXxXlX OXIWiN. X XI 


■7T O TJTTM'TiTvT 
Zi J. o O XlUlviAISl 


R 
O O D 


X jrlv^i\Jiv^Vjj\/i.i oL-xv-rt-ox* s^xvxll^ilj X xl 




"J Q 
O O / 




'7 T O ^ XJT nV/1 7\ KT 


Q Q 
O O t5 


' V'Pr'PiPr^r'Tf AP*T'r*TT'Q\7PPTJMTTr'P" 


•ZTJQ n WTTMfAXT 
ZiJM o U JiUr'lraJN 


R Q 

J o y 


V"PPnpr*r5'K'APPPTr\7T)P\7P'PTMP ttt 

X IiV_.\;^XL^VjIvr3tx' xrCiXx VXyX' V X\.X11*1X\.XX1 


r\A "3 T Q TJTnVTTX'M 

vJ^ O -J J o xlUFl/VIN 


o y Vj 


X y\^\jCA\^\j)\I\r irir ivojNXj Vi\.ll^^rCXxl 


r\A '3 Q TJTTK/TT^TVT 

VJ^ O ^ o o rlUFiAJM 


^ y X 


X x!j V_.Ux!i vjXsJ-Vx* VjO XVO X X» V XvXl^Xv. X XI 


Zi ± J J rlUlYLrtJNJ 


O y Z 


X/i.V^O'liV^ljK.lJX' Jd^^JN-OI^Xj V/\XlS^XCX Jtl 




o y 3 


xlviL.oiiL-<^Kvjji:* oV^xvoiNJXjX Xxlv^xvXxi 




'3 o yi 

j5 y ^ 


VA r^Trr^rr^'Dr'Pcir^ocTJT tpttopttj 
±J\\^sSSj\^\jr<Xjr ovy»^^^-LjX JKirlSK^JX Xxl 


•7 1 "O O TUT TA/[ 7\ "NT 


"3 Q c: 
3 y D 


VAOQTir^r'T.rr'PQT^P QTVTT.T QT40PTT4 
x/\L.iDXJv_VjXjvjX7 oiJxvOivXjX OXl^ii^JtvXxl 


Z133 ^HUMAN 


o yo 


X ACJ<JiCvjKijix?i\ix<.J\.o XXjX XxlxixCXxl 




T O '7 

o y / 


X VL-K.xiiL-VjTxC^jJt' oJnl^/^VjrXjXK.XlJNJvJNjri 


^7 T "2 Q XJT TTWI 7\ "NT 


"3 Q Q 

o y o 


L- V V— K.iiiV^lJv^-'^ Xj'ii^JXoXlXJ X XjXlVj^r'l X xl 


'7T "3 0 T-TTTI\A7\"NT 
Zi X o o xl UlYLHTs! 


Q Q 

-5 y y 


V^^nPl^r'fZTrnTTQO'PrC' AXATPMOP TPT 
X V v^xccjv^vjj\,vj7ir os^ivorw vxvns^x\.xxi 




^ u u 


X X v^oiiL-VjiS.Vjjx:' JrxCJtvoIMXjX Vxiv^x\X\xl 


O Q >1 Q O O "LIT TTWT A "NT 


^ U X 


X X ^xNIJiL-^rs.vjX' ir^jJXxCXNiXjX vxlS^KXMxl 


r^Q/lQQO TUT TTV/ITMVT 


A n o 

U ^ 


X X V^OxIj v^vrrxVkJXr XrXJx\.iDXvXI X V Xx^Iv X XI. 


\jy^oi7^ xlUlYLftJNl 


A n 


X X v«iDlZjV^V.3£\.VjrJir X X IvinX V X XXx^XuMfl 


<^OyiQOO UTTTMATsT 
L/-74oyZ XlUi^lHlM 


AHA 


X X V.-.«^XiiV^vjTi\.vjrxr X wxvOImXj X XXx\jie^X\. XXT 


r^Q/l QOO UTTMATVT 

\J^*±O^Z xiuiyi/\i>i 


A n 


X XJwOXlfV-.wPLVjyX* X V ixOl^JXiX XXlViX\.XXl 




A n c 


XVjV-.iNiiV_kjJN.VjfX' X1YIJ\jDX^XjX V Xl^j/XV. XXl 


Q /I Q Q O T-IT TTWTTV TST 


A n "7 


V T r*TJp r'm'K'nPTivrT^ q p m t wop tw 

X X \^Vi Ci V^VjlS.vjrx' X PlJtvO xvl^l X XLXlv^xv X xl 


<-/i7 4 O y ^ XT UlYl/vlM 


A n Q 


P T r* Q P Om^/P'PMK' Q P T . T PWOP T W 

XT Xv_OX!iV_VjX\. V 17 Xl^iI\.OX\.XJXXLXlV£-^XXl 


w -? ^ O y ^ XlUr'LHlM 


A n Q 

Tc u y 


V T PTJP PnKTlP A PTf QKTT A A7WOP TW 

X X v^xM d V»*\Jl\.VjrX' x-Hr IVOIM XJ V V Xl^Xx. X Xl 


■7 T Q XJTTMA'NT 


A 1 n 
^ xu 


VP PINJP PnTfT P Wn Q PT , ^\ 7WOP TW 
X IlV_IMrljL-Vjrrs.X X* XlSB^rs-O x" XJ X V XlVs^^Xv X n 


•71 Q^T ■tTTT^A7\'^T 
Zj X O D il Ul^l/MM 


ATI 

*t ± X 


VPPKTPT .r^TCTPWr'K'C' PT .T^TWOTCTW 

X XLV^X>1X1jXjvJXvX x^ XlV^xviDX^ Xj X VXlV^XxXXl 


Zi X O O X1U1^1HJ>J 


A 1 9 
^ XiZ 


VnPTsJP PHTfT^/P PTCQ PT iTT .WOP TW 

X vJ V_-X\ X_j yj Xv X V XyV^oXvO J- XJ X XJXx y/XN. X XI 


ZiJN J D flUlrLHJNl 


A 1 
^ X O 


VTPKfPPriTf APPOP QST.TVWnPTW 

X X V^lMxli V^VJXN-rt.X7 Xvv^r\.OOXJ X VXlViXVXXi 


•71 R^^ WTTMA'NT 


A 1 A 


VHP S P PCiKTP <^ OKS YLT T HHP TH 


•7T tT'y TTnM2iM 
^ J. D / XlUl^li-ilNi 


A 1 c; 


VPPSP.PC^KTPPVKT SIjTOHHPTH 

X X— 1 %J X_l XV X J7 Xv V X\.X. XJ X X X XX Xv X XX 


Z186"hUMAN 


416 


YKCIECGKTFTVNQLLTLHHRTH 


Z157_HUMAN 


417 


YECTECGKTFSEKATLTIHQRTH 


ZN84_HUiyiAN 


418 


YACSDCRKAFFEKSELIRHQTIH 


ZN84_HU]yiAN 


419 


YECSLiCRKAFFEKSELIRHLRTH 


Z140_HUMAN 


420 


YECNECRKALRCHSFLIKHQRIH 


ZN84_HUMAN 


421 


YECNECRKAFREKSSLINHQRIH 


ZN84_HUMAN 


422 


YECSECRKAFRERSSLINHQRTH 


ZN84 HUMAN 


423 


YECSECGKAFGEKSSLATHQRTH 
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ZN84 HUMAN 


424 


YECSECGKAFSEKLSLTNHQRIH 


0433 3 9 HUMAN 


425 


YECSKCGKAFRGKYSLVQHQRVH 


Z157 HUMAN 


426 


YECSECGKIFSMKKSLCQHRRTH 


Z157 HUMAN 


427 


YECGECGKFFRMKMTIiNNHQRTH 


Z157 HUMAN 


428 


YECGECGKNFRAKKSLNQHQRIH 


04 "^361 HUMAN 


429 


YKCSECGKAFSLKHNWQHLKIH 


HUMAN 


430 


YECSECGKAFSRKATLVQHQRIH 


7.1 '^4 HUMAN 


431 


YKCSECGKAFSRKDTLiVQHQRIH 


Z1 "^4 HUMAN 


432 


YECSECGKTFSRKDNLTQHKRIH 


014 70 9 HUMAN 


433 


YKCKECGKVFIRSKSLLLHQRVH 


014709 HUMAN 


434 


YECDECGKCFILKKSLIGHQRIH 


0 1 4. 7 0 9 HUMAN 


435 


YECNECGKVFILKKSLILHQRFH 


m 4 7 0 9 HITMAN 


436 


YKCNKCQKAFILKKSLILHQRIH 


7^1 4 0 HUMAN 


437 


YACAECDKAFSRS FSIil LHQRTH 


Z14 0 HUMAN 


438 


YGCHECGKTFGRRFSIiVLHQRTH 


09R878 HUMAN 


439 


YACAQCGKTFNNTSNLRTHQRIH 


O14709 HUMAN 


440 


YKCDMCCKHFNKI SHLINHRRIH 


ZNR'^ HUMAN 


'441 


FKCDI CGKI FNKKSNLASHQRIH 


7.N0 7 HUMAN 


442 


HQCEDCEKIFRWRSHLIIHQRIH 


7:n -1 7 HUMAN 


443 


HKCDDCGKVLTSRSHIilRHQRIH 


7.14 0 HUMAN 


444 


HECKDCNKTFSYLSFLIEHQRTH 


71 ft Q HTTMAKT 


445 


HKCSDCGKAFSWKSHIilEHQRTH 


07*=; A 09 HUMAN 


446 


HKCSDCGKAFSWKSHLIEHQRTH 


014709 HUMAN 


447 


YKCNDCGKVFSYRSNLIAHQRIH 


04'^'^0 9 HITMAN 


448 


YGCDDCGKAFSQHSHLIEHQRIH 


07 c; 1 9*^ HITMAN 


449 


YTCDQCGKGFGQS SHLMEHQRIH 


04 3 HUMAN 


450 


YNCTACEKAFIYKNKLVEHQRIH 


O4'^'^0 9 HITMAN 


451 


YKCDVCEKAFIQRTSLTEHQRIH 


0(^07 99 HUMAN 


452 


YKCDQCGKGF I EGP S LTQHQRI H 


04 '^'^0 9 HUMAN 


453 


YKCDKCGKAFTQRSVLTEHQ^IIH 


7TvrQ1 HTTMAKT 


454 


YKCEECGKAFKQLSTLTTHKRIH 


7.Kr Q 1 HI TM AKT 

^ J- XXVJl'X£Ta.>I 


455 


YKCKECGKAFKQFSTLTTHKI IH 


7.TJQ1 HTTMAKT 

iljX^ ^ J_ XXWl'LT^XN 


456 


YKCKECDKTFKRLSTLTKHKI IH 


5?^TvrQ1 HTTMAIJ 


.457 


YKCKECDKTFKRLSTIiTKHKI IH 


7.TvJft S HITMAN 


458 


YKCEKCGKAFNHFSHIiTTHKI IH 


ZNft 5 HUMAN 


459 


YKCEECGKAFNRFSTLTTHKI IH 


ZN4 3 HUMAN 


460 


YKCEECGKAFNQFSTLTKHKI IH 


ZN4 3 HUMAN 


461 


YTCEECGKVFNWSSRLTTHKRIH 


ZN4 3 HUMAN 

X^X V ^ — ^ X X V^X JnX X^ W 


462 


YKCEECGKAFNKSS ILTTHKI IR 


07543'7_HUMAN 


463 


YKWEKFGKAFNRSSHLTTDKITH 


043345_HUMAN 


464 


YKCEEGGKAFNWS S TLT YYKS AH 


ZN9 INHUMAN 


465 


YKCEECGKAFNQSSNLTTHKI IH 


ZN9 INHUMAN 


467 


YKCEECGKAFNRSSKLTTHKI IH 


Q02313_HUMAN 


468 


YKCEECGKAFNQSSTLTTHNI IH 


ZN9 INHUMAN 


469 


YKCEECGKAFNHS S SLSTHKI IH 


ZN43_HUMAN 


470 


YKCEECGKAFKLSSTLSTHKI IH 


ZN9 INHUMAN 


471 


YKCEECGKAFSQSSTLTTHKI IH 


Q02313 HUMAN 


472 


YKCEECGKAFNQS STLTTHKRIH 
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O95780_HUMAN 


473 


YKCEECGKAFNS SSI LTEHKVIH 


095779__HUMAN 


474 


YKCEECGKAFNS SSI LTEHKVIH 


ZN91_HUMAN 


475 


YKCKECGKAFKHSSALAKHKIIH 


ZN85__HUMAN 


476 


YKCKECGKAFKHSSTLTKHKI IH 


ZN85 HUMAN 


477 


YKCEECDKAFKWSSVLTKHKI IH 


ZN43 HUMAN 


478 


YKCEECGKAFKWS STLTKHKI IH 


ZN85 HUMAN 


479 


YKCEE CGKGF KWP S TLT I HKI I H 


ZN91 HUMAN 


480 


YKCGECGKAFKE S SALTKHKI I H 


ZN91 HUMAN 


481 


YKCEECGKAFRKS S TLTEHKI IH 


ZN91 HUMAN 


482 


YKCEECGKAFRQS STLTKHKI IH 


Q02313 HUMAN 


483 


YKCGECGKAFNQS SALNTHKI IH 


ZN91 HUMAN 


484 


CKCKECEKTFHWS STLTNHKEIH 


075437 HUMAN 


485 


YKCKECGKTFNWS STLTNHRKI Y 


ZN91 HUMAN 


486 


YKCKECGKAFSNS STLANHKI TH 


ZN91 HUMAN 


487 


YKCKECGKAFSNSSTLANHKITH 


043345 HUMAN 


488 


YKCKECGKTFIKVSTLTTHKAIH 


043345 HUMAN 


489 


YKCEECGKTFSKVSTLTTHKAIH 


043345 HUMAN 


490 


YKCEECGKTFSKVSTLTTHKAIH 


043345 HUMAN 


491 


YKCEECGKAFSKVSTLTTHKAIH 


043345 HUMAN 


492 


YKCKECGKAFSKVSTLITHKAIH 


095270 HUMAN 


493 


YACRMCGKAFKRSSTLSTHLLIH 


GFIl HUMAN 


494 


YDCKI CGKSFKRS STLSTHLLIH 


075346 HUMAN 


495 


YKCI ICGKAFKRSSTLTTHKKIH 


ZN43 HUMAN 


496 


YKCKECGKAFNQYSNLTTHNKIH 


ZN85 HUMAN 


497 


YKCPCECGKAFNRSSTLTTHRKIH 


ZN9 1 HUMAN 


498 


YKCSEECDKAFIWSSTLTEHKRIH 


ZN91 HUMAN 


499 


YKCEECGKAFI SSSTLNGHKRIH 


ZN4 3 HUMAN 


500 


YKCEECGKAFNYS SHLNTHKRIH 


095780 HUMAN 


501 


YKCEECGKAFNWSSILTEHKRIH 


095779 HUMAN 


502 


YKCEECGKAFNWSS ILTEHKRIH 


043345 HUMAN 


503 


YKCEECGKAFNWSSNLMEHKRIH 


043345_HUMAN 


504 


YKCEECGKAFNWSSNLMEHKRIH 


043345_HUMAN 


505 


YKCEECGKAFNWSSNLMEHKKIH 


043345 HUMAN 


506 


YKCEECGKAFNWSSNLMEHKKIH 


ZN91 HUMAN 


507 


FKCKECGKAFIWSSTLTRHKRIH 


ZN91 HUMAN 


508 


FKCKE CGKGF I WS STLTRHKRIH 


ZN9 INHUMAN 


509 


YKCEECGKAFLWSSTLRRHKRIH 


ZN91 HUMT^LlSr 


510 


YKCEECGKAFLWSSTLTRHKRIH 


Q02313_HUMAN 


511 


YKCEAYGRAFNWSSTLNKHKRIH 


ZN91 HUMAN 


512 


YKFEECGKAFRQSLTLNKHKI IH 


Z141_HUMAN 


513 


YKCEECGKAFRRSTDRSQHKKIH 


0753 4 6_HUMAN 


514 


YKCEECGKAFNWSSDLNKHKKIH 


ZN9 INHUMAN 


515 


YKCEECGKAFNWSSSLTKHKRIH 


ZN9 INHUMAN 


516 


YKCEECGKAFNWSS SLTKHKRFH 


ZN8 5_HUMAN 


517 


YKCEECGKAFNWSSTLTKHKRIH 


.ZN43_HUMAN 


518 


YKCEECGKAFNWPSTLTKHNRIH 


ZN43_HUMAN 


519 


YKCEECGKAFNWPSTLTKHKRIH 


075437 HUMAN 


520 


YKCEECGKAFFWSSTLTKHKRIH 
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O95780 HUMAN 


521 


YKCEECGKAFNWCSSLTKHKRIH 


095779 HUMAN 


522 


YKCEE CGKAFNWC S S LTKHKRI H 


ZN43 HUMAN 


523 


YKCEECGKAFSRS SNLTKHKKIH 


7,N4 3 HUMAN 


524 


YKCTECGEAFSRSSNLTKHKKIH 


7.N9T HUMAN 


525 


YKCEECGKAFSRSSTLTKHKTIH 


07S4'^7 HUMAN 


526 


YKCEECGKAFNRSSTFTKHKVIH 


7.141 HUMAN 


527 


YKCEECGKAFNRFTTLTKHKRIH 


7.1 41 HUMTVNT 


528 


YKCEECGKAFNRS TTLTKHKR I H 


7.TSJ4 HUMAN 

^xv^>J xxui'xnJLV 


529 


CKCEKCGKAFNCPS 1 1 TKHKRIN 


n4'^'^4^ HUMAN 


530 


YKCEACGKAYNTFS ILTKHKVIH 


n4'^34R HUMAN 


531 


YKCEECGKAFSTFS ILTKHKVIH 


04 3 4 S HUMAN 

v_/ ^ M ~J xxwx*xnj.N 


532 


YKCEECGKSFSTFSILTKHKVIH 


n4.'^'^4^ HUMAN 


533 


YKCEECGKSFSTFSVLTKHKVIH 


nzL'^^Ac; HITMAN 


534 


YKCEECGKGFVMFS ILAKHKVIH 


n4'^'^4^ RTTMAN 


535 


YKCEECGKGFSMFS ILTKHEVIH 


04^^34 R HUMAN 


536 


YKCEECGKGFSMFS ILTKHEVIH 


v^*:x ^ xxVyi'xn-LV 


537 


YKCKECGKAFSKFS I LTKHKVIH 


n4'^'^4R TTTTMAN 

^ ^ •~J XXUl'Xrt-LM 


538 


YKCKECGKAFSKFS ILTKHKVIH 


04 4 HUMAN 


539 


YKCKECGKAFSKFS ILTKHKVIH 


nZL'^'^/m WTTMAN 


540 


YRCKECGKAFSKFS ILTKHKVIH 


/j _L -? -J XX V-^l*X£"lxM 


541 


FKCEECDSIFKWFSDLTKHKRIH 


riQR'TRO TTTTMAN 

J ZJ 1 Q \J XX U l*Xrl-LM 


542 


YKCEKCDKVFKRFSYLTKHKRIH 


^iQ^7 7Q HITMAN 


543 


YKCEKCDKVFKRF S YLTKHKRI H 


OQRVftn HUMAN 


544 


CICEECGKTFKWFSYLTKHKRIH 


nQc;7'7Q HITMAN 


545 


CICEECGKTFKWFSYLTKHKRIH 


7.TJ4 '\ HI IMA N 


546 


YKCEECGKAFNHFS I LTKHKRI H 


7MQ1 TTTTMAN 


547 


YKCEKCCKAFNQS S I LTNHKKI H 


009*^1*^ HUMAN 


548 


YKCEKCVRAFNQASKLTEHKLIH 


7.NrftR HITMAN 

O «^ XXV^l^XT'U.M 


549 


YKS KECEKAFNQS S KLTEHKKI H 


7>vT4 PTTTMAN 


550 


YKCKECAKAFNQS SNLTEHKKIH 


7'MR'^ HITMAN 
ZjxVI O — ) xxwl^triLXN 


551 


YKCEECGKAFNQSSKLTKHKKIH 


7Krft HTTMAKT 


552 


YKCEECGKAFNQS SNL I KHKKIH 


HA'^'^ZLc; "HITMAN 


—J —J 


YKCEECGKAFNRS AI L I KHKRIH 


04*^*^4^ HUMAN 


554 


YKCEECGKAFNQSAILIKHKRIH 


n4'^'^4R HITMAN 


555 


YKCEECGKAFNQSAILTKHKI IH 


7.N4 HT TMAN 

iLii.M'z^ xxwi^xnj.>) 


556 


YKCEVCGKAFNQFSNLTTHKRIH 


7iN4 3 HUMAN 


557 


YTCEECGKAFNQFSNLTTHKRIH 


07 53 4 6 HUMAN 


558 


YRCEECGKAFNQSANLTTHKRIH 


ZNR5 HUMAN 


559 


YTCEECGKAFNQS SNLTKHKRIH 


Z141"'hUMAN 


560 


YKCKDCDKAFKRFSHLNKHKKIH 


Z14 INHUMAN 


561 


YKCKECDKAFKQFSLLSQHKKIH 


Q02313_HUMAN 


562 


YKCEECGKAFKQFSNLTDHKKIH 


ZN43_HUMAN 


563 


YKCEECGKAFTQSSNLTTHKKIH 


ZN43_HUMAN 


564 


YKCEECGKAFTQSSNLTTHKKIH 


ZN85_HUMAN 


565 


YKCEECGKAFKQSSNLTTHKI IH 


Q02313_HUMAN 


566 


YKCEECGKAFNQLSNLTRHKVIH 


ZN85_HUMAN 


567 


YECEKCGKAFNQSSNLTRHKKSH 


095780 HUMAN 


568 


YNCEECGKAFNRCSHLTRHKKIH 
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095779 HUMAN 


569 


YNCEECGKAFNRC SHLTRHKKI H 


0957 80 HUMAN 


570 


YTCEDCGRAFNRHSHLTKHKT I H 


095779 HUMAN 


571 


YTCEDCGRAFNRH SHLTKHKT I H 


002313 HUMAN 


572 


YECEECGKAFNRSSKLTEHKYIH 

X. 1 1 Vm> X— J X-i V_J X XX^XX, X^ X >■ W*' fc*-"* X % 111 «4« 1 1 i XXX V X X 




573 


YKrFRrGKAFNRSSNT.TTHKFIH 

X XxV^Xi^.CjV_«\JX\_r^^ XM X V 1— ' XM XJ X XXXX\>X XXX 


ZN91 HUMAN 


574 


YKCEE CGKAFNRS SNLT 1 HKF I H 

X XV\^X_JX_J Va«\JX\X^X X\ XV^ t— 'XM X^ X JU X XX\.X JLXX 


ZN43 HUMAN 

iUXN ^ ^ XX V/1 X<^X>I 


575 


YKCE KCGKAFNR P SNL I EHKKI H 

X X\.\_*X_lX\.\^VJ^Xvf^X XNX^X Im^XN XJ ^ X^XXXVX^^ XX 


Z 1 4 1 HITMAKT 


576 


YTPEFPR K"T FT55 S SNFAKHKRTH 

X X \— J_J 1-1 \ — X\.XV JL X X w i-? tJXN X xTlXVX XivXV J- X X 


Z141 HUMAN 


577 


FTCEECGS I FTTS SHFAKHKI IH 

X X v«X_lX_I^^^JkJ/ Jk. im XX w l>JXXX X^XVXXXX.^ X.XX 


PI14T HTIMAW 

jl -L xrx v_y I'Xr^-LM 


578 


YTCEECGKAFTCWSTjTPNEHKRTH 


Z 1 4 1 HUMAN 


579 


YTCEECGKAFROS SKTjNEHKKVH 

X X v»«x-jx_jv«>vjx\jnkX xv\^w xvxjxnxjxxx^xv v xx 


043:545 HUMAN 


58 0 


YKCEECGKAYKWSSTLiSYHKICIH 

X XW_>XjXII\^VJX\X^ X X\.v« W X XJt^ X XXXVXX^ XX 


V— / ^ J —3 t: — ' XX \j I'LcaXv 


581 

^ o u. 


YKrEECGKAYICWS S TTjS YHKKT H 

X X\.V^X_jX_I V_.VJX\X^ X X\.» ¥ t.^ X 1 li^ X XXXVXvX XX 


043345"hUMAN 


582 


YKCEECGKAYKWPSTLSYHKKIH 


043345_HUMAN 


583 


YKCEECGKAYKWPSTLSYHKKIH 


043345_HUMAN 


584 


YKCEECGKAYKWPSTLRYHKKIH 


043345_HUM2\N 


585 


YKCEECGKGFSWS STLS YHKKIH 


0433 4 5_HUMAN 


586 


YKCEECGKAFSWLSVFSKHKKIH 


043345_HUMAN 


587 


YKCEECGKAFSWLSVFSKHKKTH 


O957 80__HUMAN 


588 


YKCEECGKAFHWCSPFVRHKKIH 


09577 9__HUMAN 


589 


YKCEECGKAFHWCSPFVRHKKIH 


Z195__HUMAN 


590 


YTCEECGNI FKQLSDLTKHKKTH 


Z19 5 HUMAN 


591 


YKCEE CGRAFMWF SD I TKHKOTH 

X XW>>X_lX_J VJX\>X^X X XVVX l^X^X. X XXX X X X XX 


043 3 45 HUMAN 


592 


YKCEE CGKAF S WP S RLTEH KATH 


04 "5 4 5 HUMAN 


593 


YKCEE CDKTX F SWP S SIjTEHKATH 

X X\.V^ \\ F 1v^i-yX\-fT.X VV XT 1^ XJ X X-IXXXX-T^ X XX 


7!Kr4 TTTTMAKT 


5Q4 


YKCEECGK AFKWS S KTjTEHKT TH 

X XVV_< XZiXZi V_« v7X\_c^X XWV tJ iJ X\, 1 tXX-«XXXvX. X XX 


^ O XX 1>J i'XrvLN 


5Q5 

■.J ^ •J 


YKrEF.CGKAFKWSSKTjTEHKT.TH 

X XWw> X_J X-J V— vJX>-tiX X\_¥ V X\_J 1 X X_lX XX <- J. J X XX 


^ZMQI TTTTMZVTvT 

^ JAI -7 X n, U 1*1«J.>I 


5 Qfi 

^ 7 o 


YKCEECGKAPSHSSATjAKHKRTH 

X xvv_.X-iX-j v>-'\-7X\_r^xr ijxxtj i i r^x vx xxvx\- x xx 


ZN91 HITMAKT 


597 


YKCEECGKAFSHSSALAKHKRIH 


ZN9 1 HUMAN 

^JX>l ^ Jm XX \_l 1*U^XM 


598 


YKCEECGKAFSHSSTIxAKHKRIH 


ZN9 1 HUMAN 

iUX>i ^ .L. XXWJ. XT^XM 


599 


YKCEECGKAFSOPSHLiTTHKRMH 

X XW^ X_IX^ >^\JXXX^X fc_^^^X >>^XXXJX XXXX\>X\^ XXX 


ZM9 1 HUMAN 


600 


YKCE E CGKAF S O S S TIjTRHKRIjH 

X i V i — ] -L-^ v_; X njtxx i — ' iw/ -x -i— ■ -l x\^x xvx vx-ixx 


ZN91 HUMAN 


601 


YKCEECGKAFSOS stltrhtrmh 


Z124 HUMAN 


602 


YE CMECGKAIiGF SRS LNRHKRI H 


Z14:l. HUMAN 


603 


YKCDECGKAFGRSRVLNEHKKIH 


ZN74 HUMAN 


604 


YKCDE CGKAFT W S TNLLEHRR I H 


Z195 HUMAN 


605 


YKCD E CGKAYTQ S S HL S EHRR I H 


Z195 HUMAN 

&J ^ XX\^A X4r^X>N 


606 


YKCDECGKNFTOSSNIjIVHKRIH 


Z195 HUMAN 

£-1 JU. ^ ^ X X \J 1 X£^_L>i 


607 


YKCDE CGKNFTO S SNL I VHKR I H 


ZN8 0 HUMAN 


608 


YKCKECGS VFNKNSLLVRHQQ I H 


Z165~HUMAN 


609 


FGCKECGRAFNLNSHL I RHQRI H 


Q02313_HUMAN 


610 


YKCKECGKAFNQTSHLIRHKRIH 


O607 92_HUMAN 


611 


YKCNECGRAFNQNI HLTQHKRI h 


ZN74_HUMAN 


612 • 


YRCGECGKAFNQRTHLTRHHRIH 


Q15776_HUMAN 


613 


YKCKECGKAFNGNTGLIQHLRIH 


O43 3 0 9_HUMAN 


614 


YKCDECGNAFRGITSLIQHQRIH 


O43309_PIUMAN 


615 


YKCEECGKAFRGRTVLIRHKI IH 


075123 HUMAN 


616 


YVCNECGKRFSQTSNFTQHQRIH 
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^1 "7 
OX/ 


YKPN'RCGKAFNGPSTFIRHHMIH 




O X o 


FVCSECGKAFTHC S TF I LHKRAH 




O JL ^ 


YFCSOCRKAFTHRSTF I RHNRTH 


0432 57 0 HUJyiAJNI 


oz u 


VKTNFPGTC A FTHRSNF VTjHNRRH 

T f\ V -IV r'lV ^VTJX/^ 17 iXi.i\.lJXMJL V/ XJl xXM XvX vX X 


OZF HuMAJNl 


D Z J. 


vnPT^TF r(^K A F c;OF v^TKAT iHIiR T H 


ZN83 HUMAN 


o z z 


VTfPKTFRriTCAFHnriT iKTiPTHOT TH 

X x\.V^xAJ JZjx\.vjrxVx^i^ xL\^\j±Jj.J.-i-ic j.xxSi^ a. o-xx 


ZNO 7 HUMAN 




VTrr'TvT'B'PnTCAFCin'NrSTIjFnTTOT TH 
X J\.v_-i.>j S2i o v^i.\ o X j-ii* \^rxs^x X J. J. 


ZN83 HUMAN 


DZ 4 


VTrr'TvTT^nrilTr^T'FC'PKrQV'TiAn'fTT.T TW 
X JN-L^iN JiV^vjivV J? ojaXM o X xxMs«£jn.xjx xn 


ZN8 3 HUMAN 


o c: 
bZ D 


VFr'MTf r'riTOTTTQPlsTQVTiVn'H'TiT TH 

X liv^lMXvL^vJJXV V 0£vL>] O i X> V v^-H-XlX X XT. 


ZN8 3 HUMAN 


oz o 


X jVL-i\ xii VJXV V X* VJf J_llM O OXJ«XlJxX\.i\.XXT. 


ZN8 3 __HUMAN 


^ O "7 
OZ / 


X j\x^ IN Hj ^ xv V V n\gJX orixji-i.v^rix\. JL xxx 


ZN8 3__HUMAN 


oZ o 


X JT^-^l-Vl J-/ w x\. V XjJ.vi^'iOXxXJl/^'i^Xixx.JCvX JX 


ZN83 HUMAN 


o Q 

bz y 


X J\.v_.i\liZjL-»vjxV V X" iMS^X oriXl/-lv^xi.v^X\.Xxx 


ZNB 3_HUMAN 


63 U 


X XvL-iM V V^wivV P xinx OXj.J_Lrt.V>^XiV^X\.XXx 


ZN83 HUMAN 


D J J. 


X jS.Vw-lJrIjV^wx\.V J? OVat/lN O X Xlri. X XX V V Xv. X n. 


Z18 9_HUMAN 


b-3 Z 


vTrnnFrr^iTf TF ci vc* ahi jVOHOR T H 

X J\.v— X/Xlj^vjrJXX X O V OJtXxxj v y^xxs^xv, x ix 


075802 HUMAN 


b3 J 


VTCPnF PflTTTF Vfl AHTiVOHORTH 

X x\.v^UJZi^v7xvX X O V OxxxxX_J V ^xx^xvxxx 


m^Trt O TTTTfV/TTV TVT 

ZN8 3 HUMAN 


bo 4 


VTrPnTTPTTPTAFC^nMC^HT iVOFTHP TH 

X X^.v^l-'XIiV^X/XV.riLX OSi^xM O X xx_i V v^xxxax\.x 


^-v ^ /~\ *^ T TT TTV R TV TvT 

O 6 0 7 9 2 ^HlJMAN 


^ o c 


X JA.V-.^JJJcjV-.VjJX/Tk.r' OStfXVXXxJLJ V'^Xi^XX.XXi 


0433 61_HUMAN 


^ o ^ 

63 6 


XJiL-.VjXiOOJXVX' xvXxMOOXJxXSJx^X XXx 


ZN8 3 _HUMAN 


637 


r .^v-,JMii^vjJ\/\x' oi*irco ojlj xixxixxi^xxi. 


06 0 7 9 2_HUMAN 


^ o o 

63 8 


X JfcV.V^iNIliL-O'JNjhi.X' O X l^OOXl X ^^xixvixxxa 


-* ^ r- I T TT T* iT T\ "KT 

Z13 7__HUMAN 


o Q 

63 y 


vTrvwnpriTr^ZF n A Q Y A KTTR P T H 

X XN. X x±LJ\ vJTXv V X Os^x^O Lj X J-^±\J. ix\,iv J- X X 


^> 1 yl T TT TTV /T "TV TvT 

014709 HUMAN 


64 U 


vTrPtrnprtlTr A F QVMQ c;t T iVHPP T H 

X r\ V - nt 1 Jl, - vT J\ f-^A? O J.XM OOXJXl V irix\,X^^XX 


Z124_HUMAN 


64 1 


X V \--.l^lxik^wJ\J-i.X OVw-XJOOXJ^^VjlXxX INJtJTX 


>^ ^ *^ 1— » /-s #^ TTT TTV/TTV TVT 

06 07 92 HUMAN 


64Z 


vnntTFr'nTrT'FC'vric' qt .tohpk'TH 

X ^L^XlXliwwJXX X O i IJXO OXJX ^XxA\.XvXXx 


O60 792_HUMAN 


643 


X J Jl. -A^p.l -t-Tlxri X O X Vv OOXJLn.v^XxXJI\.XXx 


ZN8 3_HUMAN 


644 


X ^V-.iN Jiv_.vJxvV x: OxlxVO OXJ VXNJXl VV JvXXx 


ZN83 HUMAN 


y- yl f— 

64 b 


X JS-L-1>1 JiL-Virx\.V x: OxlJXOOlJ VL>IXlV>JXv.XXx 


Z132_HUMAN 


H A C 

64 6 


VTrr*QT?r^rtlTrFFQPTf Q QTiTPTTWPVH 
X Oxj^vJxvX^ V O XvlvO O XJ X v— xx Vt Xv V xl 


V ^ O T TT TTVyr TV IVT 

04 333 9_HUMAN 


64 / 


X J\.V»»JNXL V^kJxVX; V tD\^ X O XxXJi.M X-'XiXvxv X xl 


04 3 33 8 HUMAN 


^ A Q 

64 o 


VFPQFPr:iTrc»FC»nTQHTiTsrnHPP TH 
X x-i O JC4\_, vzrX\.i_) " X OxxxjxnX/xxx\.xn.x XI 


r^TVT A r~ TTTTTViTTV TvT 

ZN4 5_HUMAN 


64 y 


VTrnMAPftlTf QFQVC; QTTT iKTTHPP TH 

X JXwJNxiv—VjXVO P O X OOxXXJXn XXX\_«x\.XX4. 


f-yXT >1 1 — T TT TTV A TV XT 

ZN45 HUM/Vn 


^ c n 
6bU 


vTTpn'ppriiTmFQP q qht ."MVHPP th 

X JVv^VjX ^VJj\.wX OXtOOX/XJiN V xxv^xvXxx 


ry *~\ y — T TT TTV if TV TVT 

Z2 63 HUMAN 


6 b X 


VTrr^ PT .n(^TrKTF QTVTNTC^ KTT . T P HOP T H 

X xVv-ir XjV VjXSXN X OXMIM OxMXlX x\.XX<^x\.XXX 


n *~S r\ r-\ TTTTTV/TTV TvT 

Z 2 0 2 HUMAN 


c c; O 
o bZ 


VT PPT PriTf Q F R n YHT . T PHOP TH 

X X xT X V^VjXViZ) S, O XvVJ X XXXI X X\.XXSeiX\, XXX 


r~ o cr r\ tttttvatvat 

O7 5 85 0 HUMAN 


6 b J 


TT G ppoPOTfci F c?P KTHLVRHOLTH 


rrOrtC TTT TTV A A TvT 

Z2 05 xiUMAN 


^ c: A 


YAPPT.PriKqFSRRSNLHRHEKIH 

f-\ \ . f i IV - V T r\ ti r* t^x\.xN.i.Px>ixjxxxvxxx-jx\.^ XX 


T IT ET "0 Cr TJT T1WT A TvT 

Olbbjb HUMAN 


c: 

o D -3 


HOPTFPnK^^FNRHPNLiIRHOKIH 

y \ V-j^ - 1 - V ■ ' > f * ' XM X\_i Xv_»XM AJ J. XVXX^^XX. J- AX 


r7TVTO A TJTTTVATVTvT 

ZW2 4 HUMAN 


O D O 


V F P VO P (^Kfi Y S O S S NLFRHORRH 

X XIjv-» V 'i^v-»vjrx\.0 X w'^u 1— 'Xii J-ix x>>xx^xvxvxx 


Z191_HUMAN 


657 


YECVQCGKS YSQS SNLFRHQRRH 


Q99592_HUMAN 


658 


YTCTQCGKSFQYSHNLSRHAWH 


Q13 3 97__HUMAN 


659 


YTCTQCGKSFQYSHNLSRHAWH 


Z18 9_HUMAN 


660 


YLCRQCGKSFSQLCNLIRHQGVH 


O75802_HUMAN 


661 


YLCRQCGKSFSQLCNLIRHQGVH 


Z18 9_HUMAN 


662 


YQCKECGKSFSQLCNLTRHQRIH 


O75802_HUMAN 


663 


YQCKECGKSFSQLCNLTRHQRIH 


Z263 HUMAN 


664 


YKCTLCGENFSHRSNLIRHQRIH 
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Z263_HUMAN 


665 


YKCPECGEIFAHSSNLLRHQRIH 


0958 78_HUMAN 


666 


YKCSECGKSFSRSSNRIRHERIH 


Z2 63_HUMAN 


667 


YTCHECGDSFSHSSNRIRHIiRTH 


0433 3 6_HUiyiAN 


668 


YVCIICGKSFIRSSDYMRHQRIH 


043 33 6_HXJMAN 


669 


YVCMECGKSFIHSYDRIRHQRVH 


BCL6_HUMAN 


670 


YRCNICGAQFNRPANLKTHTRIH 


Z133_HUMAN 


671 


YKCGECGLSFSKMTNLLSHQRIH 


ZN75__HUMAN 


672 


YRCSWCGKSFSHNTNLHTHQRIH 


O60 8 93_HUMAN 


673 


YKCNECERSFTRNRSLIEHQKIH 


ZN74__HUMAN 


674 


YKCSECGRAFSQNHCLIKHQKIH 


0147 09_HUMAN 


675 


YACSECGKGFTYNRNLIEHQRIH 


Z177_HUMAN 


676 


YKCFQCEKAFSTSTNLiIMHKRIH 


O60792_HUMAN 


677 


YKCNECEKAFSRSENLINHQRIH 


094892_HUMAN 


678 


YGCTLCAKVFSRKSRLNEHQRIH 


Z189__HUMAN 


679 


YHCTKCKKSFSRNSLLVEHQRIH 


O758 02_HUMAN 


680 


YHCTKCKKSFSRNSLLVEHQRIH 


O433 09_HUMAN 


681 


YQCTQCNKSFSRRS ILTQHQGVH 


015535_HUMAN 


682 


YQCSQCSKSYSRRSFLIEHQRSH 


Z2 05__HUMAN 


683 


YTCPACRKSFSHHSTLIQHQRIH 


Z189_HUMAN 


684 


YTCIECGKSFSRSSFLIEHQRIH 


O758 02__HUMAN 


685 


YTC I ECGKS FS RS S FL I EHQRI H 


Z189__HUMAN 


686 


FQCNECGKS FSRS S F VI EHQRI H 


O758 02_HUMAN 


687 


FQCNECGKSFSRSSFVIEHQRIH 


Z18 9_HUMAN 


688 


YLCTVCGKS FSRS S FLI EHQRI H 


O758 02_HUMAN 


689 


YLCTVCGKSFSRSSFLIEHQRIH 


O14 7 09_HUMAN 


690 


YECHVCRKVLTSSRNLiMVHQRIH 


O147 09_HUMAN 


691 


YECDKCRKSFTSKRNIiVGHQRIH 


ZN3 5_HUMAN 


692 


YECNECGKTFTRSSNLIVHQRIH 


075123_HUMAN 


693 


YECNECGKSFIRSSSIjIRHYQIH 


0432 96_HUiyiAN 


694 


YECVECGKSFCWSTNLIRHAI IH 


04 3 2 96_HUiy[AN 


695 


YECSECGKVFLESAAIilHHYVIH 


043337_HUMAN 


696 


YECTQCGKAFHRSTYIilQHSVIH 


043296_HUMAN 


697 


YECTECGKTFI KSTHLLQHHMIH 


O75290_HUMAN 


698 


YECKECGKYFSRSANLIQHQSIH 


Z2 0 5_HUMAN 


699 


YACTDCGKRFGRSSHLIQHQI IH 


Z165_HUMAN 


700 


YECSECGKTFRVSSHLIRHFRIH 


Q15776_HUMAN 


701 


YECDECGKTFRRSSHLIGHQRSH 


Q15776_HUiyiAN 


702 


YECNECGKAFSHSSHLIGHQRIH 


Z189_HUMAN 


703 


YECNYCGKTFSVSSTLIRHQRIH 


O75802_HU3yiAN 


704 


YECNYCGKTFSVSSTLIRHQRIH 


043337_HUMAN 


705 


YECNACGKAFSQSSTLIRHYLIH 


ZN07_HUMAN 


706 


YECSECGKAFSRSSYLIEHQRIH 


Z13 2__HUMAN 


707 


YECSECGKAFAHSSTLIEHWRVH 


O43340_HUMAN 


708 


YECSECGKAFSCNIYLIHHQRFH 


Z135_HUMAN 


709 


YECGECGKAFSQSTLLTEHRRIH 


043 33 8_HUiyiAN 


710 


YECGECGKSFSQSSNLIEHCRIH 


04333 8_HUiy[AN 


711 


YECGKCGKSFTQHSGLILHRKSH 


Z140 HUMAN 


712 


YECDECGKVFTWHASLIQHTKSH 
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Q13398_HUMAN 713 

Q133 98_HUMAN 714 

O43340_HUMAN 715 

O43340_HUMAN 716 

O43340_HUMAN 717 

O43340_HUMAN 718 

O43340__HUMAN 719 

O43340_HUMAN 720 

Q133 98__HXJMAN 721 

Q13398__HUMAN 722 

Z132_HUMAN 723 

Z132_HUMAN 724 

Q13398_HUMAN 725 

043339_HUMAN 726 

043339_HUMAN 727 

O60765_HUMAN 728 

043338_HUMAN 729 

043339_HUMAN 730 

04333 8_HUMAN 731 
Q13398_HUMAN 732 

04334 0_HUMAN 733 
O43340_HUMAN 734 
Q13398_HUMAN 735 
043339_HUMAN 736 
043338_HUMAN 737 
043339_HUMAN 738 
Q133 98_HUMAN 739 
O43340_HUMAN 740 
O4334 0_HUiyiAN 741 
O43340_HUMAN 742 
Z189_HUMAN 743 
O758 02_HUMAN 744 
O43340_HUMAN 745 
O43309_HUMAN 746 
Q1577 6_HU3yiAN 747 
015535_HUMA3Sr 748 
O60792_HUMAN 749 
Q1577 6_HU]yiAN 750 
ZN84__HUMAN 751 
Q15776_HUMAN 752 
Z189_HUMAN 753 
O75802__HUMAN 754 
Z189_HUMAN 755 
O758 02_HUMAN 756 
ZN24_HUMAN 757 
Z191__HUMAN 758 
OZF_HUMAN 759 
Q15776_HUMAN 760 



YACPECGKSFSQIYSIiNSHRKYH 

YECSKCGKSFKQSSSFSSHRKVH 

YECSECGKSFSHSTNLFRHWRVH 

YECSECGKSFSHSTNLYRHRSAH 

YECSECGKSFSQSSGLLRHRRVH 

YKCSECGKSFSQSSGFLRHRKAH 

YECSECGKVFSQSSGIiFRHRRAH 

YECDECGKSYSQSSALLQHRRVH 

YECSECGKSFSQSSSLIQHRRVH 

YECGECGKSFSQRSNLMQHRRVH 

YECSECRKSFSRSSSLIQHWRIH 

YECSQCGKSFSRSSALIQHWRVH 

HECNECGKSF SRS S S L IKHRRLH 

YKCGECGNSFSQSAILNQHRRIH 

YKCGDCGKSFSQSSILilQHRRIH 

YRCEECGISFGQSSALIQHRRIH 

YECGQCGKSFSLKCGLIQHQLIH 

YECGQCGKSFSQKSGLIQHQWH 

YDCGQCGKSFIQKSSLIQHQWH 

YQCSQCGKSFGCKSVLIQHQRVH 

YVCSECGKSFGQKSVLIQHQRVH 

YDCSECGKSFRQVSVLIQHQRVH 

YECSECSKSFSCKSNLIKHIiRVH 

YECGQCGKSFSQKATLIKHQRVH 

YVCGQCGKSFSQRATLI KHHRVH 

YECSQCGKSFSQKATLVKHQRVH 

YECSECGKSFSQMFSLIYHQRVH 

YECSVCGKSFIRKTHLIRHQTVH 

YECSECEKSFSCKTDLIRHQTVH 

YECRECGKSFTRKNHL I QHKTVH 

HKCEECGKGFVRKAHFIQHQRVH 

HKCEECGKGFVRKAHF IQHQRVH 

HECSECGKSFSRKTHLTQHQRVH 

YQCKECGKSFSQSGLIQHQRIH 

YQCNQCGKAFSQSAGLILHQRIH 

YHCKECGKVFSQSAGLiIQHQRIH 

YNCNECRKTFSQSTYLIQHQRIH 

YHCKECGKAFSQNTGLILHQRIH 

YGCNECGRAFSEKSNLINHQRIH 

YKCNECGRAFSQKSGLIEHQRIH 

HKCDECGKAFSRNSGLIQHQRIH 

HKCDECGKAFSRNSGLIQHQRIH 

HKCEECGKAFSRSSGLIQHQRIH 

HKCEECGKAFSRSSGLIQHQRIH 

YKCLECGKAFSQNSGLINHQRIH 

YKCLECGKAFSQNSGLINHQRIH 

YQCSECGKAFSQKSHHIRHQKIH 

YQCNECGKAFIQRSSLIRHQRIH 
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ZN3 5 HUMAN 


761 


YDCSECGKAFSQLSSIilVHQRIH 


ZNO 7_HUMAN 


762 


YRCEECGKAFGQ S S S L IHHQRI H 


O60765__HUMAN 


763 


FKCNTCGKTFRQSSSRIAHQRIH 


OZF_HUMAN 


764 


FKC S ECGTAFGQKKYL I KHQNI H 


OZF HUMAN 


765 


FECNECGKAFSQKQYVIKHQNTH 


Q92 9 5 INHUMAN 


766 


FECTHCGKS FRAKGNLVTHQRIH 


OZF HUMAN 


767 


FECNECGKS F S QKENLLTHQKI H 


ZN74 HUMAN 


768 


FKCNECGKAFS SHAYL. I VHRRI H 


ZN74 HUMAN 


769 


FKCADCGKGFSCHAYLjLVHRRIH 


060765 HUMAN 


770 


FKCSECGRAFSQSASLilQHERIH 


ZN3 5 HUMAN 


771 


FECHECGKAF I QSANLWHQRI H 


ZN3 5 HUMAN 


772 


FTCSVCGKGFSQSANLiWHQRIH 


ZN3 5 HUMAN 


773 


FACNDCGKAFTQSANL I VHQRSH 


O14709 HUMAN 


774 


YKCNECGKDFSQNKNLWHQRMH 


014709 HUMAN 


775 


YKCDECGKTFAQTTYIilDHQRLH 


O14 7 09 HUMAN 


776 


YKCNECGKVFSQNAYLiIDHQRLH 


014709 HUMAN 


777 


YKCTECGKAFTQSAYIiFDHQRLH 


O14709 HUMAN 


778 


YKCNE CGKAF S Q S AYIiLNHQR I H 


Z157 HUMAN 


779 


YQCNECGKSFRVHSSLGIHQRIH 


060765 HUMAN 


780 


YNCNECGKALSSHSTIillHERIH 


EVIl HUMAN 


781 


YKCDQCPKAFNWKSNL I RHQMSH 


Q15776 HUMAN 


782 


YQCNVCGKAFSYRSAIjLSHQDIH 


O43309 HUMAN 


783 


YECNECGKAFVYNS SLVSHQE IH 


Z200 HUMAN 


784 


YGCKKCGRRFGRLSNCTRHEKTH 


015361 HUMAN 


785 


YGCKKCGRRFGRLSNCTRHEKTH 


ZNO 7 HUMAN 


786 


YKCNDCGKAFNRSSRIiTQHQKIH 


ZN74 HUMAN 


787 


YQCGSCGKAFTCHSSLTVHEKIH 


ZN3 5 HUMAN 


788 


YVCSKCGKAFTQSSNLTVHQKIH 


Z140 HUMAN 


789 


YEC I ECGKAFRRF SHLTRHQS I H 


060893 HUMAN 


790 


YQCNMCGKAFRRNSHLLRHQRIH 


Q13396 HUMAN 


791 


YSCTECEKSFVQKQHLLQHQKIH 


043361 HUMAN 


792 


YECTQCAKAFVRKSHLVQHEKIH 


043361 HUMAN 


793 


YECTECEKAFVRKSHLVQHQKIH 


075123 HUMAN 


794 


YECKECGKAFLQKAHLTEHQKIH 


075290 HUMAN 


795 


YECKECGKGFNRGAHLIQHQKIH 


075290 HUMAN 


796 


YECKECGKGFNRGAHLIQHQKIH 


075290 HUMT^ 


797 


FECKECGKAFRLHMQLIRHQKLH 


075290 HUMAN 


798 


FECKECGKAFRLHMHLIRHQKLH 


O752 90]2hUMAN 


799 


FECKECGKAFRLHIQFTRHQKFH 


O75290_HUMAN 


800 


YECKECGKAFRLYLQLSQHQKTH 


Z140_HUM2W 


801 


YECTECGKAFSRASNIjTRHQRIH 


043296_HUMAN 


802 


YECVECGKAFTRMSGLTRHKRIH 


0432 96_HUMAN 


803 


YECMECGKAFNRKSYLTQHQRIH 


014913_HUMAN 


804 


HECVECGKRFS S S SRLQEHQKIH 


EVI1_HUMAN 


805 


HACPECGKTFATSSGLKQHKHIH 


015535_HUMAN 


806 


• YECNECGKAFSRSSGLFNHRGIH 


Z132_HUMAN 


807 


YECNDCGKAFSNSSTLIQHQKVH 


Z132_HUMAN 


808 


YECIQCGKAFSERSTLVRHQKVH 
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•71 7 9 UT7MAT<r 

X O ^ Jn \J i' JXILLM 


809 


YECDECGKAFSNRSHLIRHEKVH 


«7T OA ■plTMAKr 


810 


YECQKCGKAF SRA.S TLWKHKKTH 


•7TST*5 R TTTTMAISJ 


811 


FKCNECEKAFS YS SQLARHQKVH 




812 


FE C S ECGKAF S YL SNLNQHQKTH 




813 


FRC SE CGKAF S HGSNL S QHRKI H 




814 

KJ J- TC 


FACPQCGRAFSHS SNLTQHQLLH 


KJ^r XlUiXLrtxSI 


O -L —J 


FACKVCGKVFSHKSNIjTEHEHFH 
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XJ JL> VJ XX\^X ^x^X>l 


1078 


YECSECGRAFSQSSNLiSQHQRIH 


Z132 HUMAN 


1079 


YECSECGRAI^TNNNSNIiAQHQKVH 


Z23 9 HUMAN 


1080 


YECEECGMSFSQRSNLHIHQRDH 


000153 HUMAN 


1081 


HQCQVCGKTFSQSGSRNVHMRKH 


0133 98~HUMAN 


1082 


YVCGECGKS FSHS SNLKNHQRVH 


015322 HUMAN 

' JL, ^ ^-a ^ XX V_/X XLXXM 


1083 


YKCE I CGKS FCLRS S LNRH YMVH 


07 5 1 2 3^HUMAN 


1084 


FKCAQCGKAF CHS SDL I RHQRVH 


014913^HUMAN 


1085 


YKCEECDKAFL YHS FLRRHKAVH 


014913 HUMAN 


1086 


YKCEECDKAFLHHSYLRKHQAVH 


ZN83 HUMAN 


1087 


FKCNECGKIiFRDNSYLVRHQRFH 


015322 HUMAN 


1088 


HTCNE CGKS FC YI SALRI HQRVH 


O607 92]]hUMAN 


1089 


FGCNDCGKSFRYRSALNKHQRLH 


Z137_HUMAN 


1090 


YKCNKCGKI FRHRS YLAVTQRTH 


075123__HUMAN 


1091 


YVCNVCGKDFIHYSGLIEHQRVH 


Z134_HUMAN 


1092 


YKCNECGKYFSHHSNLIVHQRVH 


04 3 3 61_HUMAN 


1093 


FECS I CGKFFSHRSTLNMHQRVH 


Z13 4_HUMAN 


1094 


FECIECGKFFSRSSDYIAHQRVH 


Z13 4_HUMAN 


1095 


FVCS KCGKDF I RTSHLVRHQRVH 


014913 HUMAN 


1096 


YKCQECGKSFCYRSYLREHYRMH 
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Z174_HUMAN 1097 

O60765_HU3yiAN 1098 

043167_HUMAN 1099 

043 82 9_HUMAN 1100 

O00403_HUMAN 1101 

075626_HXJMAN 1102 

015322_HUMAN 1103 

BCL6_HUMA2SF 1104 

Z195_HUMAN 1105 

ZN85_HUMA]Sr 1106 

Z239_HUMAN 1107 

Z239_HUMA]Sr 1108 

015322_HUMAN 1109 

015322_HUMAN 1110 

014913_HUMAN 1111 

014913_HUiyiAN 1112 

014913_HUMAN 1113 

ZN4 5_HIJMAN 1114 

ZN45_HUMZysr 1115 

ZN45_HUMAN 1116 

ZN45_HUMAN 1117 

ZN45_HUMAN 1118 

ZN45__HUMAN 1119 

ZN45_HUMAN 1120 

ZN4 5_HUMAN 1121 

ZN45_HUMAN 1122 

ZN45_HUMAN 1123 

ZN4 5_HUMAN 1124 

ZN45_HUiyiAN 1125 

075467_HUMAN 1126 

ZN42_HUMAN 112 7 

O60765_HUMAN 1128 

TYYl^HUMAN 112 9 

015391_HUMAN 1130 

TYYl^HUMAN 1131 

015391_HUMAN 1132 

Q14872_HUMAN 1133 

GLI1_HUMAN 1134 

GLI3_HUMAN 1135 

O6 02 55_HUMAN 1136 

O60254__HUMAN 1137 

O60253_HUMAN 1138 

OS0252_HUMAN 113 9 

GLI2_HUMAN 1140 

O9540 9_HUMA]Sr 1141 

Q15915_HUMAN 1142 

ZIC3_HUMAN 1143 

GLIl HUMAN 1144 
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YKCDD CGKS FTWNS ELKRHKRVH 
YRCKECGKSFSRRSGLFIHQKIH 
YSCGI CGKSFSDS S AKRRHC ILH 
FVCEMCTKGFTTQAHLKEHLKIH 
FVCEMCTKGFTTQAHLKEHLKIH 
FKCQTCNKGFTQLAHLQKHYLVH 
FKCEQCGKGFRCRAI LQVHCKLH 
YKCETCGARFVQVAHLRAHVLIH 
YKCEKCGKAFTQFSHIiTVHESlH 
YKCKKCGKAFNQSAHLTTHEVIH 
YKCEKCGKGFTRSSSLLIHHAVH 
YKCEQCGKGFTRSSSLLIHQAVH 
YKCEECGKGFTDSLDLHKHQIIH 
YICSKCGRAFIHDLKLQKHQIIH 
YKCEKCGKGFFRSSDLQHHQKIH 
YKCEECGKCFSSFTSLKRHQI IH 
YP YKCEECGKGFSRS S KLQEHQT I H 
YKGEHCVKSFSWSSHLiQINQRAH 
. YKCEECGKGFSWSSSLI IHQRVH 
YKCEECGKVFSWSSYLQAHQRVH 
YKCEKCDNAFRRFS SLQAHQRVH 
YKCERCGKAFSQFSSLQVHQRVH 
YKCEECGVGFSQRSYIiQVHLiKVH 
YKCEECGKSFSWRSRLQAHERIH 
YKCEECGKGFSVGSHLQAHQISH 
YQCAECGKGFSVGSQLQAHQRCH 
YQCEECGKGFCRASNFLAHRGVH 
YKCEECGKGFCRASNLLDHQRGH 
YKCEECGKGFSQASNLLAHQRGH 
FVCALCGAAFSQGS S LFKHQRVH 
YHCGECGLGFTQVSRLTEHQRIH 
YRCNECGKGFTS I SRLNRHRI IH 
YVCPFDGCNKKFAQSTNLKSHILTH 
FVCPFDVCNRKFAQSTNLKTHILTH 
FQCTFEGCGKRFSLDFNLRTHVRIH 
FQCTFEGCGKRFSLDFNLRTHLRIH 
YQCTFEGCPRTYSTAGNLRTHQKTH 
HKCTFEGCRKSYSRLENLKTHLRSH 
HKCTFEGCTKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENIjKTHIjRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
HKCTFEGCSKAYSRLENLKTHLRSH 
FQCEFEGCDRRFANS SDRKKHMHVH 
FKCEFEGCDRRFANSSDRKKHiVIHVH 
FKCEFEGCDRRFANS SDRKKHMHVH 
YMCEHEGCSKAFSNASDRAKHQNRTH 
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O60255_HUMAN 1145 

O60 2 54_HUiyiAN 1146 

O60253_HUiy[AN 1147 

O60252_HUMAN 1148 

GLI3_HUMAN 1149 

GLI2_HUMAN 1150 

Z143_HUM2\N 1151 

TF3A_HUMAN 1152 

TF3A_HUMAN 1153 

Q14872_HUMAN 1154 

Q14 872_HUMAN 1155 

ZN7 6_HUMAN 1156 

Z143_HUMAN 1157 

Q14872_HUMAN 1158 

O00153_HUMAN 1159 

ZN76_HUMAN 1160 

Z143_HUMAN 1161 

Q15915_HUiyiAN 1162 

O95409_HUMAN 1163 

ZIC3_HUMAN 1164 

ZN76_HXJMAN 1165 

Z143_HU]yiAN 1166 

O00153_HUMAN 1167 

ZN76_HUiyiAN 1168 

Z143_HUMAN 1169 

Q14872_HUMAN 1170 

ZN76_HUMAN 1171 

Z143_HUMAN 1172 

BTE INHUMAN 1173 

BTE2_HIJMAN 1174 

043839_HUMAN 1175 

UKLF_HUMAN 117 6 

O9560 0_HUMAN 1177 

Q13118_HU]y[AN 1178 

075411_HUMAN 1179 

EZF_HUMAN 1180 

014901_HUMAN 1181 

SP4_HUMAN 1182 

O604 02_HUiyiAN 1183 

EKLF_HUMAN 1184 

WT1_HUMAN 1185 

Q16256_HUMAN 1186 

Q15881_HUMAN 1187 

SP2_HUMAN 1188 

043167_HUMAN 118 9 

075467_HUMAN 1190 

ZEP1_HUMAN 1191 

Q02646 HUMAN 1192 
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YVCEHEGCNKAF SNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCTVPGCDKRFTEYS SLYKHHWH 

FKCTQEGCGKHFASPSKLKRHAKAH 

FVCDYEGCGKAFIRDYHLSRHILTH 

FECDVQGCEKAFNTLYRLKAHQRLH 

FVCNQEGCGKAFLTSHSLRIHVRVH 

YRCDFPSCGKAFATGYGLKSHVRTH 

YQCEHAGCGKAFATGYGLKSHVRTH 

FRCDHDGCGKAFAASHHLKTHVRTH 

F I C PAEGCGKS F YVLQRLKVHMRTH 

FQCPFEGCGRSFTTSNIRKVHVRTH 

FKCPFEGCGRSFTTSNIRKVHVRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKIFARSENLKIHKRTH 

YTC P E PHCGRGFT S ATN YKNH VR I H 

YYCTE PGCGRAFAS ATNYKNHVRI H 

FMCHESGCGKQFTTAGNLKNHRRIH 

YKCPEELCSKAFKTSGDLQKHVRTH 

YRCSEDNCTKSFKTSGDLQKHIRTH 

FNCESEGCSKYFTTLSDLRKHIRTH 

FRCGYKGCGRLYTTAHHLKVHERAH 

FRCEYDGCGKLYTTAHHLKVHERSH 

HKCPYSGCGKVYGKSSHLKAHYRVH 

HYCDYPGCTKVYTKS SHLKAHLRTH 

HRCHFNGCRKVYTKS SHLKAHQRTH 

HRCQFNGCRKVYTKS SHLKAHQRTH 

HQCDFAGCSKVYTKSSHLKAHRRIH 

HICSHPGCGKTYFKSSHLKAHTRTH 

HICSHPGCGKTYFKSSHLKAHTRTH 

HTCDYAGCGKTYTKS SHLKAHLRTH 

YVCSFPGCRKTYFKSSHLKAHLRTH 

HICHIEGCGKVYGKTSHLRAHLRWH 

HI CHI EGCGKVYGKTSHLRAHLRWH 

HTCAHPGCGKSYTKSSHLKAHLRTH 

FMCAYPGCNKRYFKLSHLQMHSRKH 

FMCAYPGCNKRYFKLSHLQMHSRKH 

FMCAYPGCNKRYFKLSHLQMHSRKH 

HVCHI PDCGKTFRKTSLLRAHVRLH 

YACKDCHRKFMDVSQLKKHLRTH 

YACRACSKVFVKSSDLLKHLRTH 

YICEYCNRACAKPSVLLKHIRSH 

YICPYCSRACAKPSVLKKHIRSH 
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075362_HUMAN 1193 

Q92981_HUMAN 1194 

O76019_HUMAN 1195 

RRE1_HUMAN 1196 

075626_HUMAN 1197 

Z202_HUMAN 1198 

075123_HUMAN 1199 

Z151_HUMAN 1200 

SNAI_HUMAN 1201 

043623_HUMAN 1202 

O954 0 9_HU]yiAN 1203 

ZIC3_HUMAN 1204 

O00146__HUMAN 1205 

O00146__HUMAN 1206 

IKAR_HUMAN 1207 

CTCF_HUMAN 1208 

HKR3_HUMAN 1209 

Q15552_HUMAN 1210 

043591_HUMAN 1211 

PLZF__HUMAN 1212 

Z151_HUMAN 1213 

MAZ_HUMAN 1214 

014753__HUMAN 1215 

095365_HUiyiAN 1216 

015156_HUMAN 1217 

O75066_HU]VIAN 1218 

095365_HUMAN 1219 

015156_HUMAN 1220 

Z151_HUMAN 1221 

Z151_HUMAN 1222 

Z151_HUMAN 1223 

015090 HUMAN 1224 
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YACSYCGKFFRSNYYLNIHIiRTH 

YKCVQPDCGKAFVSRYKLMRHMATH 

YKCVQPDCGKAFVSRYKIiMRHMATH 

YAC S VCNKRFWS LQDLTRHMRSH 

HECQVCHKRFSSTSNLKTHLRLH 

HDCSVCGKSFTCNSHLVRHLRTH 

YACDICGKTFTFNSDLVRHRI SH 

HKCSVCSKAFVNVGDLSKHI IIH 

YACVCGTCGKAFSRPWLLQGHVRTH 

YACVCKI CGKAFSRPWLLQGHIRTH 

HVCFWEECPREGKPFKAKYKLVNHIRVH 

HVCYWEECPREGKSFKAKYKLVNHIRVH 

HECKLCGASFRTKGSLIRHHRRH 

HVCQFCSRGFREKGSIiVRHVRHH 

FQCNQCGASFTQKGNLLRHI KLH 

HKCHIjCGRAFRTVTIjLRNHLiNTH 

HVCEFCSHAFTQKANLNMHLRTH 

HVCEHCNAAFRTNYHLQRHVF I H 

HVCEHCNAAFRTNYHLQRHVFIH 

YI CSECNRTFPSHTALKRHIjRSH 

YVC I HCQRQFADPGALQRHVRI H 

YICAXjCAKEFKMGYNLRRHEAIH 

HLCTGCGKGFNDTFDLKRHVRTH 

YECNI CKVRFTRQDKLKVHMRKH 

YACEVCGVRFTRNDKLKIHMRKH 

YSCEECGAKFAANSTIiKNHLjRLH 

YLCCXJCGAAFAHNYDLKNHMRVH 

YSCPHCPARFLHSYDLKNHMHLH 

HKCEDCGKEFTHTGNFKRHIRIH 

YRCEDCGKLFTTSGNLKRHQIiVH 

YKCRECGKQFTTSGNLKRHLRIH 

YDCPYCGKTFRTSHHLKVHLRIH 
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Example 3: Non-human zinc finger databases. 

For providing novel combinations of non-antigenic, optimised zinc fingers, for use in 
species other than himians, separate species-specific zinc finger databases are required, 
5 such as mouse, chicken, pig, cow, etc. 

The fingers listed below are in a format that can be linked with classical wild-type 
canonical "TGEKP" linkers (i.e. , . .TGEKP - zinc finger peptide sequence - TGEKP - 
zinc finger peptide sequence - TGEKP - etc. . For each peptide sequence, an 
10 oligonucleotide is designed to encode the peptide sequence; the oligonucleotide can then 
be linked into a library selection system, as described in the Examples infi-a. 

Mouse Zinc Finger Database. 

1 5 544 zinc finger units 



Name 


SEQ ID NO 


Peptide sequence 


03 5745_MOUSE 


1225 


HQCTHCEKTFNRKDHIjKNHLQTH 


ZFX2_]yiOUSE 


1226 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFX1_M0USE 


1227 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFY2__MOUSE 


1228 


HKCDMCSKGFHRPSELKKHVATH 


ZFYl_MOUSE 


1229 


HKCDMCSKGFHRPSELKKHVATH 


ZFX2_MOUSE 


1230 


HKCDMCDKGFHRPSELKKHVAAH 


ZFX1_M0USE 


1231 


HKCDMCDKGFHRPSELKKHVAAH 


ZFA_MOUSE 


1232 


HKCDMCDKGFHRP SELKKHVAAH 


Q9Z162_MOUSE 


1233 


YTCSVCGKGFSRPDHLSCHVKHVH 


MAZ_MOUSE 


1234 


YNCSHCGKSFSRPDHIjNSHVRQVH 


Q083 76_MOUSE 


1235 


YSCEVCGKSFIRAPDLKKHERVH 


Z151_MOUSE 


1236 


HKCPHCDKKFNQVGNLKAHIjKIH 


ZFX2_M0USE 


1237 


FRCKRCRKGFRQQSELKKHMKTH 


ZFX1_M0USE 


1238 


FRCKRCRKGFRQQSELKKHMKTH 


Q62 518_MOUSE 


1239 


YVCTMCGKGYTLNSNLQVHLRVH 


Q60636_MOUSE 


1240 


YECNVCAKTFGQLSNLKVHLRVH 


Q9Z117_MOUSE 


1241 


CSCPECGKVLHQLSHLRSHYRLH 


Q61898_MOUSE 


1242 


CSCPECGREFHQLSHLRKHYRLH 


08 8631_MOUSE 


1243 


YSCQYCGKVFHQLSHFKSHFTLH 


Q61164_MOUSE 


1244 


HKCPDCDMAFVTSGEIiVRHRRYKH 


03 5483_MOUSE 


1245 


FRCADCGRGFAQRSNLAKHRRGH 


035483_MOUSE 


• 1246 


FVCGVCGAGFSRRAHIiTAHGRAH 


O70162_MOUSE 


1247 


FVCRDCGQGFVRSARLEEHRRVH 


Q9ZlD8_MOUSE 


1248 


HRCGDCGKFFLQASNFIQHRRIH 


03 5483_MOUSE 


1249 


HRCPDCGKGFGHS SDFKRHRRTH 


03 5483 MOUSE 


1250 


ADCGKSFVYGSHLARHRRTH 
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035483_iyiOUSE 

08 82 82_MOUSE 

Q61065_MOUSE 

BCIj6_MOUSE 

O70162_MOUSE 

O70162_MOUSE 

Q9Z0G7_MOUSE 

Q0 83 76_MOUSE 

Q64318_M0USE 

Q64318__MOUSE 

Q9ZlDB_MOUSE 

Q9Z1D8__M0USE 

Q9Z2X6_M0USE 

KIDl_MOUSE 

Q9Z1D7_M0USE 

ZF9 0_MOUSE 

Q9Z2X6__MOUSE 

Q9Z2X6__M0USE 

Q9Z2X6_MOUSE 

Q9Z2X6__M0USE 

Q9Z2X6_MOUSE 

Q9Z2X6_iyiOUSE 

Q9Z2X6_MOUSE 

ZF3 7_MOUSE 

Q62 514___M0USE 

Q614 91_MOUSE 

ZF3 7jyiOUSE 

Q62514jyiOUSE 

Q61491_MOUSE 

Q61491__MOUSE 

Q6 14 9 l_MOUSE 

QS14 91_MOUSE 

Q61491_MOUSE 

Q61491_MOUSE 

Q614 91_MOUSE 

Q61491_MOUSE 

Q61491_MOUSE 

Q9Z2X6_MOUSE 

Q9Z2X6_MOUSE 

Q6 14 9 l_MOUSE 

Q9Z2X6_MOUSE 

Q614 91_MOUSE 

Q64247jy[OUSE 

Q9Z2X6_MOUSE 

Q9Z2X6__iyiOUSE 

Q64247_MOUSE 

Q64247_MOUSE 

Q64247_MOUSE 
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12 51 FPCPDCGKRFVYKSHLVTHRRIH 

1252 YKCQLCRS AFRYKGNLASHRTVH 

1253 YKCDRCQASFRYKGNLASHKTVH 

1254 YKCDRCQAS FRYKGNIiASHKTVH 
12 55 FACQDCGRRFNQSTKLIQHQRVH 

1256 " - CVECGERFGRRS VLLQHRRVH 

1257 -DCPVCNKKFKMKHHLTEHMKTH 

12 58 HMCDKAFKHKSHLKDHERRH 

1259 HECGICRKAFKHKHHLIEHMRIjH 

12 60 FKCTECGKAFKYKHHLKEHLRIH 

12 61 FKCNECGKGFGRRSHLAGHLRLH 

1262 YGCNECGKSFGRHSHL,IEHLKRH 

1263 YVCKQCGKAFTLSSSLRRH 

.12 64 YVCKECGKAFTIiSTSLYKHLRTH 

12 65 HGCDECGKSFTQHSRLIEHKRVH 

12 66 YRCNLCGRSFRHSTSLTQHEVTH 

1267 YVCKECGKAFARSTSLHIHEGTH 

12 68 YVCKHCGKAYTTYNTLRAHERSH 

12 6 9 YVCKHCGKAYTTYNTLRAHERSH 

12 7 0 YVCKHCGPCAYTS YSTLiRAHERSH 

1271 YVCKHCGKAYTS YS TLRAHERSH 

1272 YVCKHCGKAYTS YSTLRAHERSH 
12 73 YVCKHCGKAFTQSSYLRIHKRTH 
1274 YECEQCGKAHGHKHALTDHLRIH 
12 75 YECEQCGKAHGHKHALTDHLRIH 
1276 YECNQCGKAFTQFFPLKRHEI TH 
12 77 YKCDECGKAFGHSSSLTYHMRTH 
127 8 YKCDECGKAFGHSSSLTYHMRTH 
12 79 YQCNQCAKAFP YHRTLQI HERTH 
12 8 0 CEYNQCWKAFAYHKTLQIHERTH 
12 81 YECNQCGKT^ACYQSFQIHKRTH 
12 82 YECNQCGKAFACNRYLQIHKRTH 
12 83 YECNQCGKAFACPRYLQI HKRTH 
12 84 YECNQCGKAF ACLRNLQNHKTTH 
12 85 FECNQCGKAFAHHSTLQRHKRTH 
12 8 6 YECNQCGKAFTRHSTLQIHKRTH 
12 87 YECNQCGKAFTCRSNLQIHKRTH 
1288 YVCKQCGKAFTRSSHLQIHKITH 
12 8 9 YICKQCGKAFARSSHLQIHKRSH 
12 90 YKCKQCGKDFTHHSTLHIHKRIH 
12 91 YSCKLCGKAFTHSNYLQIHKRIH 
12 92 YECNQCGKAFARNSNLLDHKRIH 
12 93 YICKQCGKTFRYLSCFQKHERIH 
1294 YACKQCDKAFKYLSSLQNHKRIH 
12 95 HACKQCGKSFKRQSNVQAHERJSTH 
12 96 YTCKHCTKTFTTS STRNSHEKTH 
12 97 YACKHCGKAFTTSSARNSHERIH 
1298 YACKHCGKAFTSSSDRNSHERIH 
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Q64247_MOUSE 12 99 

Q6424 7_MOUSE 13 00 

088412_MOUSE 1301 

ZF3 5_MOUSE 13 02 

Q9Z2X6_MOUSE 13 03 

ZF3 8__MOUSE 13 04 

OZF_MOUSE 1305 

Q9Z0Q5_MOUSE 13 0 6 

ZF90_MOUSE 13 07 

OZF_MOUSE 13 08 

Q9Z0Q5_MOUSE 13 0 9 

ZF90__MOUSE 1310 

Z151__MOUSE 1311 

OZF_MOUSE 1312 

Q9Z0Q5_MOUSE 1313 

Q9Z162_MOUSE 1314 

Q9Z162_MOUSE 1315 

MAZ_MOUSE 1316 

Q61898_MOUSE 1317 

Q60585_MOUSE 1318 

03 54 83_MOUSE 1319 

Q60585_MOUSE 1320 

Q9ZlD9__MOUSE 1321 

Q9ZlD9_MOUSE 1322 

088631_MOUSE 1323 

Q60585_MOUSE 1324 

iyiLZ4_M0USE 1325 

Q9Z116__MOUSE 132 6 

O70237_MOUSE 1327 

GFI1_M0USE 1328 

Q61624_MOUSE 132 9 

P974 7 5_MOUSE 133 0 

Q61624_MOUSE 1331 

P97475_MOUSE 1332 

ZFPl_iyiOUSE 13 3 3 

Q9Z116_MOUSE 1334 

Q9Z116_MOUSE 1335 

ZFP1_M0USE 133 6 

Q06054_MOUSE 1337 

Q06054_MOUSE 1338 

Q06054_MOUSE 1339 

Q06054_MOUSE 1340 

Q06054_MOUSE 1341 

Q06054_MOUSE 1342 

Q06054_MOUSE 1343 

Q06054_MOUSE 1344 

MLZ4_M0USE 1345 

ZF37 MOUSE 1346 



YPCKYCGKAFATSSDRNSHERIH 
YSCTHCGKAFSSPSDYNSCERIH 
YVCNECGKAFTCSSYLIilHQRIH 
YMCNHCYKHFSQSSDLIKHQRIH 
YVCKQCGKAFAQSSYLHIHQRSH 
YQCKDCGKAFSGKGSLIRHYRIH 
YECNKCGKAFSRITSLIVHVRIH 
YECNECGKAFSQRTSLIVHVRIH 
YQCNVCGKAF KRS TS F I EHHRI H 
YECKICGKAFCQSSSLTVHMRSH 
YECNVCGKAFSQSSSLTVHVRSH 
YECIDCGKAFSQSSSLIQHERTH 
CQCVI CGKAFTQAS S L I AHVRQH 
YECKGCGKAFIQKSSLIRHQRSH 
FE CKDCGKAF I QKSNL I RHQRTH 

TYCSKAFRDSYHLRRHQSCH 

HACEMCGKAFRDVYHLNRHKIiSH 
HACEMCGKAFRDVYHLNRHKLSH 
FRCTECDKSFIRSSHLiREHQKIH 
FDCKECGKTFSRGYHIiTLHQRIH 
YACAECGRRFGQSAALTRHQWAH 
YACTECGKS FRQVAHLTRHQRLN 
YACPECGECFRQSSHLiSRHQRTH 
YKCFQCGERFRQSTHLVRHQRIH 
YKCTKCDKLFTQYSHLRRHQRIY 
YKCTECKKAFRQHSHLTYHQRIH 
HKCTECAKASAASPHLIQHQRTH 
YECTECSKAFCQKSHIiTQHQRVH 
YPCQFCGKRFHQKSDMKKHT Y I H 
YPCQYCGKRFHQKSDMKKHTF I H 
FRCDECGMRFIQKYHMERHKRTH 
FRCDECGMRFIQKYHMERHKRTH 
FQCSQCDMRFIQKYLLQRHEKIH 
FQCSQCDMRFIQKYLLQRHEKIH 
FVCNYCDKTFSFKSLLVSHKRIH 
YI CFECRKAFYRKSELTDHQRIH 
YECKECGKAFCQKPQLTLHQRIH 
YGCSECGKTFAQKFEIjTTHQRIH 
YKCSDCGKCFIQKAISriiRTHQKIH 
YKCSDCGKCFIQKANIjRTHERIH 
YKCSDCDKCFIQKAKIiKKHQRIH 
YKCSECDKCFIQKDHLRTHQRLH 
YKCSECDKCFIRKANLiRRHHRIH 
YKCSECHKCFIRKAHLRRHQRIH 
YKCSECHKCFIQQAHLRRHQKIH 

yicaecnkcfiqksqlkthqrih 
hicsqcgkafsqisdlnrhqkth 
yecnecgiafsqkshlwhqrth 
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Q62 514_MOUSE 

ZF3 7_MOUSE 

Q62514_MOUSE 

ZF3 7_MOUSE 

Q62514_MOUSE 

MFG3_MOUSE 

Q62514_MOUSE 

ZF3 7_MOUSE 

Q9Z116_MOUSE 

088412_MOUSE 

Q9Z116_iyiOUSE 

P704 05_MOUSE 

ZF90_MOUSE 

KR2_MOUSE 

KR2_MOUSE 

Q9ZlD7_MOUSE 

Q61116_MOUSE 

07 02 3 7_MOUSE 
GFIl_MOUSE ■ 
Q9Z150_MOUSE 
Q9ZlD7_MOUSE 
ZF35_iyiOUSE 
ZF3 8_MOUSE 
OZ FAMOUS E 
Q9Z0Q5_MOUSE 
ZFP1_M0USE 
MLZ4_MOUSE 
Q62 514_MOUSE 
ZF3 7_MOUSE 
KR2_M0USE 

P7 04 05_MOUSE 
Q61117_MOUSE 
ZF92_]yiOUSE 
ZF29_MOUSE 

08 82 82_MOUSE 
Q61065_MOUSE 
BCIj6_M0USE 
ZF2 9_MOUSE 
Q9Z1D7_M0USE 
ZF35_MOUSE 
ZF3 5_M0USE 
ZF3 5_M0USE 
ZFP1_M0USE 
ZF35_M0USE 
088412_MOUSE 
MLZ4_M0USE 
MLZ4_MOUSE 
KR2 MOUSE 



1347 YECNECGI AFSQKSHLVLHQRTH 

1348 YE CVE CGKAF S QKSHL I VHQRPH 

1349 YECVECGKAFSQKSHLIVHQRTH 
13 50 FECNECGKTFSKKSHLVIHQRTH 
13 51 FECNECGKTFSKKSHLVIHQRTH 
13 52 FECKECGKAFHFSSQLNNHKTSH 
13 53 FECYECGKAFNAKSQLVIHQRSH 

1354 FECYECGKAFNAKSQLVIHQRSH 

1355 YECKI CGKCFYWKTSFNRHQSTH 

1356 YSCNECGKAFRQKSSLTVHQRTH 
13 57 YECAECGKAFSTKS YLTVHQRTH 

1358 YECSKCGKTFRGKYSLDQHQRVH 

1359 HECADCGKTFLWRTQLTEHQRIH 
13 60 YE CMI CGKHFTGRS SLTVHQVIH 
13 61 YECDQCGKAFIKNSSLIVHQRIH 
13 62 YKCSVCGKAFIQKISLIEHBQIH 
1363 YKCDTCGKAF S Q KS SLQ VHQR I H 
13 64 - - CRMCGKAFKRS S TL S THLL I H 
13 65 -DCKICGKSFKRSSTLSTHLLIH 
13 66 HSCGI CGKCFTQKSTLHDHLNLH 
13 67 YKCE VCGKTFRWRTVL I RHKWH 
13 6 8 * - YKCMCGKAFSQCSAFTLHQRIH 
1369 YKCKECGKAFNHSSNFNKHHRIH 
13 70 YGCNECGKAFSQFSTLALHMRIH 
13 71 YGCNECGKAFSQFSTLALHLRIH 
13 72 YECTECGKTFSQRSTLRLHLRIH 
13 73 YKCDECGKNFSQNSDLVRHRRAH 

1374 YECNECGKAFKYGS SLTKHMRI H 

1375 YECNECGKAFKYGS SLTKHMRIH 

1376 YKCHDCGKAFSKNS SLTQHRRIH 

1377 CRDCGKFFSQTSHLNDHRRIHTG 
13 78 YKCSTCGKGFSRSSDLNVHCRIH 

1379 YLCQQCGKSFSRSFNLIKHRI IH 

1380 YACKECGESFSYNSNLIRHQRIH 
13 81 YRCSICGARFNRPANLKTHSRIH 
13 82 YRCNI CGAQFNRPANLKTHTRIH 
13 83 YRCNI CGAQFNRPANLKTHTRI H 
1384 YKCRDCGKSFSRSANLITHQRIH 
13 85 YQCLQCNKSFNRRSTLSQHQGVH 
1386 YPCNSCSKSFSRGSDLIKHQRVH 
13 8 7 YPCSWCIKSFSRSSDLIKHQRVH 
13 8 8 YPCNQCTKSFSRLSDLINHQRIH 
13 89 YECDVCQKTFSHKANLIKHQRIH 
13 90 YECDKCGKTFSQSSNLILHQRIH 
13 91 YECNECGKTFTRSSNLIVHQRIH 
1392 YDCNECGKSFGRS SHL I QHQT I H 
13 93 YECTACGKSFSRSSHLITHQKIH 
13 94 YECTECGKAFSQSAYLIEHRRIH 
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ZF90__MOUSE 

MLZ4_MOUSE 

P704 05_MOUSE 

P7 04 0 5_MOUSE 

Q9ZlD8_MOUSE 

KIDl_MOUSE 

P704 05_MOUSE 

P7 04 0 5_MOUSE 

P704 05_MOUSE 

Q9ZlD8_MOUSE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

088412_MOUSE 

ZF3 5_MOUSE 

ZF3 5__iyiOUSE 

KID1__M0USE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

OZF_MOUSE 

Q9Z0Q5_MOUSE 

KIDl_MOUSE 

088412_M0USE 

08 8412_MOUSE 

088412_MOUSE 

KR2_MOUSE • 

ZF3 8_MOUSE 

KID1__M0USE 

O35700_MOUSE 

Evri__iyiousE 

Q62 518_MOUSE 
Q9ZlD8_MOUSE 
Q9ZlD8_MOUSE 
Q9ZlD7_MOUSE 
MFG3_MOUSE 
MFG3_M0USE 
088412_MOUSE 
Q9Z116_MOUSE 
Q60585_MOUSE 
Q60585_MOUSE 
ZF37_M0USE 
Q62514_]yiOUSE 
• Q61849_MOUSE 
MFG3_MOUSE 
Q61849_MOUSE 
Q06054_MOUSE 
035700 MOUSE 



13 95 YACKECGRNFSRSSALTKHHRVH 

1396 YECTECDKSFSRSSAIilKHKRVH 

13 97 YKCSECGKSFSQSSILilQHRRIH 

13 98 YKCSECGNSFSQSAILNQHRRIH 

13 99 HQ CNE CGKS F I Q S AHL, I QHRR I H 

14 00 YRCQECGMSFGQSSALIQHRRIH 
14 01 YECS QCGKS FS Q KS GL I QHQ WH 
1402 YECRECGKSFSQKATLIKHQRVH 
14 03 YECSQCGKSFSQKATLVKHKRVH 
1404 HQCNECGRGFSLKSHLSQHQRIH 
14 05 YQCSECGKAFSQKSHHIRHQRIH 
14 06 YQCSECGKAFSQKSHHIRHQKIH 
14 0 7 YDCSECGKAFSQLSCLIVHQRIH 
14 08 YKCSECGKAFNQSSVLILHQRIH 
14 09 YKCDVCGKAFSQSSDRILHQRIH 

1410 FKCNTCGKTFRQSSSRIAHQRIH 

1411 YKCNECGTI FRQKQYLI KHHNIH 

1412 FKCNECGTAFGQKKYLIKHQNIH 

1413 FECSQCGRAFSQKQYLIKHQNIH 

1414 FECNECGKAFSQKQYVIKHQSTH 

1415 FKCNECGKAFSQKENLI IHQRIH 

1416 FECSDCGKAFSQKENLLTHQKIH 

1417 FKCSECGRAFSQSASLIQHERIH 

1418 FECHECGKAFI QS ANLWHQR I H 

1419 FTCSECGKGFSQSANLWHQRIH 
142 0 FACSDCGKAFTQSANLI VHQRSH 

1421 YKCHECGKAFSQSIVINLTVHQRTH 

1422 YQCNECGKSFSQHAGLSSHQRLH 

1423 YNCNECGKALSSHSTLI IHERIH 

1424 YKCDQCPKAFNWKSNLIRHQMSH 
142 5 YKCDQC PKAFNWKSNL I RHQMSH 
142 6 YKCDVCGKSFGWRSNLIIHHRIH 
142 7 YACHLCGKAFRVRSHLVQHQS VH 
1428 YKCQVCGKAFRVSSHIjVQHHSVH 

142 9 YECNDCGKAFVYNSSLATHQETH 

143 0 YKCNACGRAFNRRSNLMQHEKIH 
1431 YKCNVCGKAFNRRSNLLQHQKIH 
143 2 YVCGKCGKAFTQS SNLTVHQKI H 

1433 YECKECRKAF YDKSNLKRHQKIH 

1434 YECKECRKFFRRYSELI SHQGIH 
143 5 YECKECGKAFRQCAHLSRHQRIH 
14 3 6 YEC I ECGKAFKQNAS LTKHMKI H 
143 7 YECIECGKAFKQNASLTKHMKIH 
1438 YECNECGKAFKRHRSFVRHQKIH 
14 3 9 FECKDCGKVFRLNIHLIRHQRFH 

1440 YECKECGKAFRliPQQLTRHQKCH 

1441 HRCNECGKSLSSSSGLQRHQRIH 

1442 HACPECGKTFATS SGLKQHKHIH 
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EVI1_M0USE 

ZF92_MOUSE 

088412jyiOUSE 

ZF90_MOUSE 

KID1_M0USE 

ZF2 9__MOUSE 

OZF_MOUSE 

O7 0162_MOUSE 

ZFP1_M0USE 

088412_MOUSE 

Q9ZlD7_MOUSE 

ZF9 0_MOUSE 

Q6424 7_MOUSE 

MFG3_MOUSE 

Q6184 9_MOUSE 

MFG3_M0USE 

Q6184 9_MOUSE 

Q642 4 7_MOUSE 

MFG3_MOUSE 

Q64247_MOUSE 

Q64247jyiOUSE 

ZF90_MOUSE 

MFG3_M0USE 

MFG3_MOUSE 

MFG3_MOUSE 

ZFP1_M0USE 

ZF92_MOUSE 

054978_MOUSE 

07 0162__MOUSE 
O70162_MOUSE 
O70162_MOUSE 
O70162_MOUSE 
O70162_MOUSE 
Q9ZlD8_MOUSE 
Q9ZlD8_MOUSE 
ZF37_M0USE 
Q62514_MOUSE 

08 82 82_MOUSE 
Q6 10 65_MOUSE 
BCL6_MOUSE 
Q60585_MOUSE 
Q6 05 85_MOUSE 
Q6 05 85_MOUSE 
OZF_MOUSE 
OZF_MOUSE 
Q9Z0Q5_MOUSE 
iyiFG3_MOUSE 
Q61849 MOUSE 



144 3 HACPECGKTFATS SGLKQHKHI H 

1444 YECGECGKTFTRS SNLVKHQVI H 

144 5 FKCSECEKAFSYSSQLARHQKVH 

144 6 FECNVCGKAFRHS S SLGQHENAH 

144 7 YECNTCGKLFNHRSSIiTNHYKIH 

144 8 YKCDECGKSFSDGSNFSRHQTTH 
1449 YKCGECGKAFSQRGNFLSHQKQH 

145 0 CDVCGKVF S QRSNLLRHQKI HTG 

1451 YECNECAKTFFKKSNLI IHQKIH 

1452 YKCKDCEKAFSCFSHIiIVHQRIH 

1453 YKCNECGRAFGQWS ALNQHQRLH 

1454 YQCSLCGKAFQRSSSLVQHQRIH 

1455 CGKVFILSGDLIKHERIH 

1456 YECEQCGSAFRIiPYQIiTQHQRIH 

1457 FECELCGSAFRCRSQLjNKHLRIH 

1458 FKCKLCESAFRRKYQLSEHQRIH 

1459 FKCQECGKAFWIjAYIjIEHQS IH 

1460 FVCKQCGEAFVNS shli sherih 

1461 fqckecgrafvrstglriherih 

1462 FVCKTCGKAFSRSDYLINHKRIH 

1463 FVCKKCGKAFKRLGHFMNHERIH 
14 64 FQCKECGKAFSRCSSLVQHERTH 

1465 FECKDCGKAFTVLAQLTRHQTIH 

1466 FHCKVCGKAFTVLAQLTRHENIH 
14 67 FECKECGKSFKRVSSLVEHRI IH 

1468 FECPECGKAFTHQSNLI VHQRAH 

1469 FECTECGKAFSRSSNLIEHQRIH 
14 70 FECQECGEAFARRSELIEHQKIH 

1471 FRCTECGQSFRQRSNLLQHQRIH 

1472 FACAECGQSFRQRSNLTQHQRIH 
14 73 FACPECGQSFRQHANLTQHRRIH 

1474 YACAECGKAFRQRPTLTQHLRTH 

1475 AECGKTFRQRATLTQHLCVHTGE 

1476 FRCEECGKS YNQRVHLI QHHRVH 
14 77 FKCGECGKS YNQRVHLTQHQRVH 

147 8 FECNQCGKAFKQIEGLTQHQRVH 

1479 FECNQCGKAFKQIEGLTQHQRVH 

1480 YPCPTCGTRFRHLQTLKSHVRIH 

1481 YPCEI CGTRFRHLQTLKSHLRI H 

1482 YPCEI CGTRFRHLQTLKSHLRIH 

1483 YDCKECGKAFRVRQQLTLHER I H 
14 84 YDCKECGKAFRVRGQLMLHQRIH 
14 85 YECGECGKAFKVRQQLTFHQRIH 
1486 YACKECGKAFNGKS YLKEHEKIH 

148 7 YTCKECGKAFSGKSNLTEHEKIH 
14 88 FICKECGKTFSGKSNIjTEHEKIH 
14 89 YKCKDCGKCFGCKSNIiHQHESIH 
1490 YQCKECGKCFRQRSKLTEHESIH 
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Q6184 9_MOUSE 1491 

Q6184 9_MOUSE 14 92 

ZF92_MOUSE 14 93 

ZF92_MOUSE 14 94 

ZF35_iyiOUSE 1495 

P704 05_MOUSE 14 96 

REX1_M0USE 14 97 

TYY1_M0USE 1498 

2FX2_M0USE 1499 

ZFXl^MOUSE 1500 

ZFA_MOUSE 1501 

ZFY2_M0USE 1502 

ZFYl__MOUSE 1503 

Q61116_MOUSE 1504 

Q0 6054__MOUSE 1505 

Q9Z117_MOUSE 1506 

Q61898_MOUSE 1507 

Q60585_MOUSE 1508 

Q60585__MOUSE 1509 

Q60585_MOUSE 1510 

KR2_MOUSE 1511 

KIDl_MOUSE 1512 

KR2_MOUSE 1513 

ZF3 7_MOUSE 1514 

Q62 514_MOUSE 1515 

KIDl_MOUSE 1516 

2F37_MOUSE 1517 

Q62514_MOUSE 1518 

Q9Z117_MOUSE 1519 

08 8631_MOUSE 1520 

Q61898_MOUSE 1521 

Q9ZlD7_MOUSE 1522 

03573 8_MOUSE 1523 

O89090_MOUSE 1524 

Q64167_MOUSE 1525 

O89087_MOUSE 1526 

Q62445_MOUSE 1527 

O8 90 91_MOUSE 1528 

Q61596_MOUSE 1529 

BTE1_M0USE 1530 

Q62445_MOUSE 1531 

Q64167_MOUSE 1532 

O89090_MOUSE 1533 

O89087_MOUSE 1534 

Q60 843_iyiOUSE 1535 

EZF_MOUSE 1536 

Q60980_MOUSE 1537 

035738 MOUSE 1538 



YECKECGKCFGCRSTLTQHQSVH 
FECEECGKKFRTARHLVKHQRIH 
F VCRMCGKVFRRS FALLEHTR I H 
YECSECGKQFQRSIiALLEHQRIH 
YECEECGKAFRMSSALVLHQRIH 
YECSECGKLFRQNSSLVDHQKTH 
HVCAECGKAFTESSKLiKRHFLVH 
HVCAECGKAFVES SKLKRHQLVH 
HICVECGKGFRHPSELKKHMRIH 
HICVECGKGFRHPSELKKHMRIH 
HICVECGKGFCHPSELKKHMRIH 
HICGECGKGFRHPSALKKHIRVH 
FI CGECGKGFRHPSALKKHIRVH 
- -CHECGKGFRQSSALQTHQRVH 
YQCRKCGKCFRTYSSLYRHRRTH 
HQCEKCRKCFSTAS SLTVHKRI H 
HQCGKCGKCFNTS S SLTVHHRI H 
YDCKECGKAFRL.FSQLTQHQSIH 
YKCMECEKTFRIiLSQLTQHQS IH 
YDCKECGKAFRLHSSLIQHQRIH 
YQCKECGKAFRKNSSLIQHERIH 
YLCNECGNTFKSSSSLRYHQRIH 
YGCDECGKTFRQS S SLLKHQRI H 
YKCMECGKTFRHS SMLMQHLRSH 
YKCNECGKTFRHSSNLMQHLRSH 
YKCNECGKTFRCNSSLSNHQRTH 
YECKECGKSFRYNSSLTEHVRTH 
YECKECGKS FRYNS SLTEHVRTH 
YKCKECGKSFLELSHLKRHYRIH 
HKCKECGKSFFILSHLKTHYRIH 
YECKECGKSFIELSHLKRHYRIH 
HGCDECGKSFTQHSRLIEHKRVH 
FKCADCDRRFSRSDHLALHRRRH 

- -CPECPKRFMRSDHLSKHIKTH 

- - CPECPKRFMRSDHLSKHIKTH 

- - CPECPKRFMRSDHLSKHIKTH 
- -CPECSKRFMRSDHLSKHVKTH 
- -CPMCDRRFMRSDHLTKHARRH 
- - CPMCDRRFMRSDHLTKHARRH 
- -CPLCEKRFMRSDHLTKHARRH 

F I CNWMFCGKRFTRSDELQRHRRTH 
FMCNWSYCGKRFTRSDELQRHKRTH 
FMCNWSYCGKRFTRSDELQRHKRTH 
FMCNWSYCGKRFTRSDELQRHKRTH 
YHCNWEGCGWKFARSDELTRHYRKH 
YHCDWDGCGWKFARSDELTRHYRKH 
YKCTWEGCTWKFARSDELTRHFRKH 
YKCTWEGCTWKFGRSDELTRHYRKH 
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Q9Z0Z7_MOUSE 

07 0261___MOUSE 
EKLF_MOUSE 
Q615 96__MOUSE 

08 9091_MOUSE 
BTEl_MOUSE 
EGR2_MOUSE 
WTl_MOUSE 
WTl_MOUSE 
EGRl_MOUSE 
KR2_MOUSE 
O357 00__MOUSE 
EVI l_MOUSE 
ZF2 9_MOUSE 
ZF3 8_MOUSE 
Q9ZlD8_MOUSE 
ZF2 9_MOUSE 
ZF2 9_MOUSE 
ZF3 8__MOUSE 
ZF2 9_MOUSE 
ZF90_MOUSE 
MIiZ4__MOUSE 
ZF2 9_MOUSE 
MLZ4_MOUSE 
MLZ4__MOUSE 
O70162_MOUSE 
035483_MOUSE 
03 54 8 3_MOUSE 
ZFPl_MOUSE 
GFIl_MOUSE 
O7 02 3 7_MOUSE 
ZF29_MOUSE 
KIDljyiOUSE 
KIDl_MOUSE 
Z151_MOUSE 
O3 5700_MOUSE 
EVIljyiOUSE 
Q6 0585_MOUSE 
Q9Z116_MOUSE 
KR2_MOUSE 
Q61164_MOUSE 
P973 65_MOUSE 
KID1_M0USE 
ZF35_MOUSE 
ZF3 5_MOUSE 
ZF38__MOUSE 
Q9ZlD9_MOUSE 
Q9Z1D9 MOUSE 



153 9 YKCTWEGCDWRFARSDELTRHYRKH 

1540 YACSWDGCDWRFARSDELTRHYRKH 

1541 YACSWDGCDWRFARSDELTRHYRKH 

1542 FS CSWKGCERRFARSDELSRHRRTH 

1543 FSCSWKGCERRFARSDELSRHRRTH 

1544 FPCTWPDCLKKFSRSDELTRHYRTH 

154 5 YPCPAEGCDRRFSRSDELTRHIRIH 

1546 YQCDFKDCERRFSRSDQLKRHQRRH 

1547 FQCKTCQRKFSRSDHLKTHTRTH 
154 8 FQCRICMRNFSRSDHLTTHIRTH 

154 9 YQCNECGKPFSRSTNLTRHQRTH 

1550 YTCRYCGKI F PRSANLTRHLiRTH 

1551 YTCRYCGKI FPRSANLTRHLRTH 

1552 FQCAECGKSFSRSPNLIAHQRTH 

1553 YVCTKCGKAFSHSSNLTLHYRTH 

1554 YQCDSCGKAFS YS SDLI QHYRTH 

1555 YQCGECGKNFSRSSNLATHRRTH 

155 6 YRCPECGKGFSNSSNFITHQRTH 

1557 YI CAECGKAFSNS SNLTKHRRTH 

1558 YECLTCGES F S W S SNL I KHQRTH 

155 9 YECNECGEAFSRLSSLTQHERTH 

1560 YHCNECGENFSRI SHLVQHQRTH 

1561 YKCLMCGKSFSRGS ILVMHQRAH 

1562 YECEECGKSFSRSSHLAQHQRTH 

1563 YKCYECGKGFSRS SHL I QHQRTH 

1564 FACPECGQRFSQRLKIiTRHQRTH 

1565 FPCPECGKRFSQRSVLVTHQRTH 

156 6 — CDECGKGFVYRSHLiAIHQRTH 

1567 YECSECGKSFIQNSQLIIHRRTH 

1568 HKCQVCGKAFSQSSNLITHSRKH 

1569 HKCQVCGKAF S QS SNL I THSRKH 

1570 YKCTECGQKFSQSSALI THRRTH 

1571 FKCKECSKAFSQSSALIQHQITH 

1572 CKCKVCGKAFRQS S ALIQHQRMH 

1573 YVCERCGKRFVQS SQLANHIRHH 

1574 YECENCAKVFTDPSNLQRHIRSQH 

1575 YECENCAKVFTDPSNLQRHIRSQH 

1576 YECKKCAKI FTCS SDLRGHQRSH 
1577- YECTVCRKSFI CKS SFSHHWRTH 

1578 YTCNVCDKHFI ERS SLTVHQRTH 

1579 FQCSLCSYASRDTYKLKRHMRTH 

1580 FQCWLCSAKFKI S SDLKRHMRVH 

1581 YKCSMCEKTFINTSSLRKHEKNH 
15 82 YTCNLCSKSFSQS SDLTKHQRVH 

1583 YHCSSCNKAFRQSSDLILHHRVH 

1584 YWCSHCGKTFCSKSNLSKHQRVH 

1585 YKCGDCEKSFRQRSDLFKHQRTH 

1586 YKCDSCEKGFRQRSDLFKHQRIH 
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ZF3 5_M0USE 1587 

ZF3 5_MOUSE 1588 

ZF3 5_MOUSE 1589 

ZF3 5_M0USE 15 90 

ZF3 5_iyiOUSE 1591 

ZF35_MOUSE 1592 

Q9ZlD9_iyiOUSE 1593 

Q9ZlD9_MOUSE 15 94 

Q9Z116_MOUSE 1595 

Q06054_MOUSE 1596 

Q9Z116_MOUSE 1597 

ZF29_M0USE 15 98 

MLZ4_M0USE 1599 

ZF37_M0USE 160 0 

Q62514_MOUSE 1601 

ZF90_MOUSE 1602 

Q614 91_MOUSE 1603 

ZF35__iyiOUSE 1604 

Q64247_MOUSE 1605 

Q61116_MOUSE 1606 

035483_MOUSE 1607 

ZF29_MOUSE 1608 

Q61117_MOUSE 1609 

Q9Z2U2_MOUSE 1610 

Q61116_MOUSE 1611 

Q61117_MOUSE 1612 

Z239_MOUSE 1613 

Z239_MOUSE 1614 

Z23 9_M0USE 1615 

Z239_MOUSE 1616 

ZF35_MOUSE 1617 

ZF3 8_MOUSE 1618 

O3570 0_MOUSE 1619 

EVIl_MOUSE 162 0 

03 54 83_MOUSE 1621 

03 54 8 3_MOUSE 162 2 

07 0162_iy[OUSE 1623 
088412_MOUSE 1624 

08 863 l_MOUSE 162 5 
08863 l_MOUSE 1626 
Z23 9_MOUSE 162 7 
Z23 9_MOUSE 162 8 
MLZ4_MOUSE 162 9 
MLZ4_M0USE 163 0 
Q61116_iyiOUSE 1631 
Q61116__MOUSE 1632 
Q62518_MOUSE 1633 
Q9Z150 MOUSE 1634 



YPCSQCSKMFSRRSDLVKHYRIH 

YQCSHCSKSFSQHSGMVKHLRIH 

YACTQCPRSFSQKSDLIKHQRIH 

YPCAQCNKSFSQNSDLIKHRRIH 

YMCNHCYKHFSQSSDLIKHQRIH 

YNCDECDQSFAWSTGLIRHQRTH 

YQCQECGKRFSQSAALVKHQRTH 

YAC WCGRRF S Q S ATL I KHQRTH 

YECKQCMKTFYRKSGLTRHQRTH 

YECKQCSKSFYTSSHLENHYRTH 

YECQLCQKAFYCTSHLIVHQRTH 

YECPQCGKTFSRKSHLI THERTH 

YECVQCGKGFTQS SNLiI THQRVH 

YECNHCGKVLSHKQGLLDHQRTH 

YECNHCGKVLSHKQGLLDHQRTH 

YECNECGRAFRKKTNLiHDHQRTH 

YECNQCGRAFRQYVYLQCHERIH 

YPCAQCGKSFSQRSDLWHQRVH 

YVCEQCGKGFIQLKYLiLMHQRSH 

YTCQQCGKGFSQASYFHMHQRVH 

YRCVFCGAGFGRRSYCVTHQRTH 

YRCGDCGKGFSQRSQLWHQRTH 

YRCDICGKRFRQRSYLHDHHRIH 

FKCWPS CTKTFTRNSNLRAHCQLVH 

YRCDSCGKGFSRSSDLNIHRRVH 

YQCHACWKSFCHSSEFNNHIRVH 

YQCYECGKGFSQS SDLRIHLRVH 

FKCDRCGKGFSQSSKLHIHKRVH 

YHCGKCGQGFSQSSKLLIHQRVH 

YKCGECGKGFSQSSNLHIHRCTH 

YKCDECGKAFSQSSDLMIHQRIH 

YDCKCGKAFGQSSDLLKHQRMH 

YRCKYCDRS FS I S SNLQRHVRNIH 

YRCKYCDRSFS I S SNLQRHVRNIH 

YRCVFCGRS FSQS S ALARHQAVH 

YLCSNCGRRFSQSSHLLTHMKTH 

FVCGECGRSFSRSSHLLRHQLTH 

YECAKCGAAFI SNSHLMRHHRTH 

YKCMECDRSYIQYSHLKRHQKVH 

YKCKECGKSYAYRTGLKRHQKIH 

YECSKCGKGFSQSSNLHIHQRVH 

YACEECGMSFSQRSNLHIHQRVH 

YECNECWRSFGERSDLI KHQRTH 

YECHECGRGFSERSDLIKHYRVH 

YECNECGKRFSLSGNLDIHQRVH 

YKCGDCGKRFSCSSNLHTHQRVH 

YKCGECGKSFICSSNLYIHQRVH 

CPRCGKQFNHSSNLNRHMNVHRG 
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Q61116_MOUSE 
Q61116_MOUSE 
Q62 518_MOUSE 
Q62518_MOUSE 
Q618 98_MOUSE 
Q618 98_MOUSE 
Q9Z117_MOUSE 
Q618 98_MOUSE 
OB8 631_MOUSE 
Q618 98_MOUSE 
Q9Z117_MOUSE 
Q618 98_MOUSE 
08 8 631_MOUSE 
Q9Z117_MOUSE 
Q618 98__MOUSE 
Q9Z117_MOUSE 
Q618 98_MOUSE 
Q9Z117_MOUSE 
Q06 0 54_MOUSE 
Q06 054_MOUSE 
Q06054_MGUSE 
Q06 0 54_MOUSE 
Q06 054_MOUSE 
Q618 98_MOUSE 
Q618 98_MOUSE 
Q618 98_MOUSE 
Q9Z117_MOUSE 
Q61B98_MOUSE 
Q9Z117_MOUSE 
088631_MOUSE 
Q61898_MOUSE 
Q9Z117_MOUSE 
088631_MOUSE 
088631_MOUSE 
Q9Z117_MOUSE 
Q618 98_MOUSE 
Q9Z117_MOUSE 
Q61898_MOUSE 
KIDl^MOUSE 
ZF29_MOUSE 
Q9Z117_MOUSE 
Q61898_MOUSE 
08 8631_MOUSE 
Q0 8376JVIOUSE 
Q60636_MOUSE 
Q6 1116_MOUSE 
08 8282_MOUSE 
Q61065 MOUSE 



163 5 FHCSVCGKNFSRSSHFLDHQRIH 

163 6 KCNVCQKQFSKTSNLQAHQRVH 

1637 y SCDVCGKGFSRS SQLQSHQRVH 

1638 FKCDACGKSFSRSSHLRSHQRVH 

163 9 YKCRECDKSFTQRAYIjRNHHNRVH 

164 0 YKCMECDKSFTHNSNFRTHQRVH 

1641 YKCMECNKSFTQDSHIjRTHQRVH 

1642 YKCIECDKSFTQVSHLRTHQRVH 

1643 YKCSECDKSFTQASQL.RTHQRVH 

1644 YKCNECDRSFTHYASLiRWHQKTH 

1645 YKCKE CDKS F AH CSS FRRHQKTH 

1646 YKCKECDKSFAHYPNFRTHQKIH 

1647 YKCKDCDI FFNHYS SLRRHQKVH 

1648 YKCKDCDI SFIQ I SNLRRHQRVH 

1649 YKCRDCDI SFSQI SNLRRHQKLH 

1650 FKCRECDKS FTKCSHLRRHQ S VH 

1651 YKCRECDKSFIHS SHLRRHQNVH 

1652 YKCRECDKSF I QRSNLI I HQRVH 

1653 YKCSECEKSFTCGSVLiRKHQKIH 

1654 YKCSECEKSFTVGSDLRMHQKIH 

1655 YKCSECEKCFTWSDIiRTHQKIH 

1656 YKCSECEKSFTVGSSLRIHQRIH 

1657 YKCECGKSFTVGSDIiRKHQKCH 

1658 YKCIECGKSFTNNS YLRTHQKVH 

1659 YRCKECDKSFHESATLREHEKSH 

1660 YRCAECDKSFTRCSYLRAHQKIH 

1661 YRCKECDKSFTECSTLRAHQKIH 

1662 YRCKECDKSFTSCSTLKAHQS IH 

1663 YICKECGKSFTRCSYLRAHQKIH 

1664 YVCKECGKSLTTCAILRAHQKIH 

1665 YECKECGKSFTTCSTIiRIHQTIH 

1666 YICKECGKSFTKCSTLQIHQKIH 

1667 YTCKQCGKSFTRGSTLRVHQRIH 

1668 YKCNI CDKSFTECSSLKEHRKTH 

1669 YKCEVCDKSFTVNSTLKTHLKIH 

1670 YKCEICDKSFTTTTTLKTHQKIH 

1671 YKCSVCGKSFTQCTNLKTHQRLH 

1672 YKCSVCDKSFTQCTHLKIHqRRH 

1673 YRCKECGKS FGRRS GLF I HQKVH 

1674 YSCPECGKSFGNRSSLNTHQGIH 

1675 YKCKECGKSFPQLSALKSHQKIH 

1676 YKCKECEKSFVQLS ALKSHQKLH 

1677 YKCNDCGKSFSYLSALQSHHKRH 

1678 FVCEMCTKGFTTQAHLKEHLKIH 

167 9 FKCQTCNKGFTQLAHL.QKHYLVH 

168 0 YKCEVCGKGFTQWAHIjQAHERIH 

1681 YKCETCGSRFVQVAHLRAHVLIH 

1682 YKCETCGARFVQVAHLiRAHVLIH 
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BCL6_MOUSE 

08 8 631_MOUSE 

Q61116_MOUSE 

Z2 3 9_MOUSE 

ZF2 9_MOUSE 

Q62 518_MOUSE 

Q61117_MOUSE 

Q61117_MOUSE 

Q61116_MOUSE 

Q61117_MOUSE 

Q61116_MOUSE 

Q61116_MOUSE 

Q61117_MOUSE 

Q61117_MOUSE 

Q61117_MOUSE 

Q61117_MOUSE 

Q62518_MOUSE 

ZF2 9_MOUSE 

O7 0162_MOUSE 

KIDl_MOUSE 

TYYl_MOUSE 

REXl_MOUSE 

TYYl_MOUSE 

MTFl_MOUSE 

GLI_MOUSE 

GLI3_MOUSE 

ZIC2_MOUSE 

ZICl_MOUSE 

ZIC3_MOUSE 

ZIC4_MOUSE 

GLI_MOUSE 

GLI3_MOUSE 

O7023 0_MOUSE 

MTFl_MOUSE 

MTFl_MOUSE 

O7023 0_MOUSE 

MTFl_MOUSE 

O7 02 3 0_MOUSE 

ZIC4_iyiOUSE 

ZIC2__MOUSE 

ZICl_MOUSE 

ZIC3_MOUSE 

O7023 0_MOUSE 

O7023 0_MOUSE 

MTFl^MOUSE 

O7023 0_MOUSE 

BTEl_MOUSE 

Q9Z0Z7 MOUSE 



1683 YKCETCGARFVQVAHLRAHVLIH 

1684 YRCE VCDKWFTL S S S L SRHQKI H 

1685 YRCEVCGKRFPWSLSLHSHQSVH 

1686 YKCDKCGKGFTRSSSLLVHHSLH 

1687 YKCGLCGKSFSQSSSLIAHQGTH 
168 8 YKCVDCGKEFSRPSSLQAHQGIH 

168 9 YRCEECGKGFSWSSSLLIHQRAH 

169 0 YKCEECGKVFS WSS YLKAHQRVH 

1691 FKCEECGKEFRWS VGLS SHQRVH 

1692 YKCETCGKAFSRVSILQVHQRVH 

1693 YKCEECGKGFS S AS S FQSHQRVH 

1694 YKCGECGKGFSHASSLQAHHSVH 

1695 YQCAECGRGFTVESHLQAHQRSH 

1696 YQCEECGRGFCRASNFLiAHRGVH 
16 97 YKCEECGKGFTRASTLLDHQRGH 

169 8 YVCEECGKGFSQASHIiLAHQRGH 
1699 YNCETCGS AFSQASHLQDHQRIiH 

170 0 YRCPECGKGFSWNSVLIIHQRIH 

1701 YCCGECDLGFTQVSRLTEHQRIH 

1702 YRCSECGKGFTS I SRLNRHRI IH 

1703 YVCPFDGCNKKFAQSTNLKSHILTH 

1704 YQCTFEGCGKRF S LDFNLRTHI RI H 

1705 FQCTFEGCGKRFSLDFNLRTHVRIH 

1706 YQCTFEGCPRTYSTAGNLRTHQKTH 

1707 HKCTFEGCRKSYSRLENLKTHLRSH 
170 8 HKCTFEGCTKAYSRLENLKTHLRSH 

1709 FQCEFEGCDRRFANS SDRKKHMHVH 

1710 FKCEFEGCDRRFANS SDRKKHMHVH 

1711 FKCEFEGCDRRFANS SDRKKHMHVH 

1712 FRCEFEGCERRFANS SDRKKHSHVH 

1713 YMCEQEGCSKAF SNASDRAKHQNRTH 

1714 YVCEHEGCNKAFSNASDRAKHQNRTH 

1715 YVCTVPGCDKRFTE YS SLYKHHWH 

1716 FECDVQGCEKAFNTIiYRLKAHQRLH 

1717 FVCNQEGCGKAFLTSYSLRIHVRVH 

1718 YQCEHSGCGKAFATG YGLKSHFRTH 

1719 FRCDHDGCGKAFAASHHLKTHVRTH 
172 0 FKCPIEGCGRSFTTSNIRKVHIRTH 

1721 FPCPFPGCGKVFARSENLKIHKRTH 

1722 FPCPFPGCGKVFARSENLKIHKRTH 

1723 FPCPFPGCGKVFARSENLKIHKRTH 

1724 FPCPF PGCGKI FARS ENLKI HKRTH 
172 5 YYCTEPGCGRAFASATNYKNHVRIH 
172 6 YRCSEDNCTKSFKTSGDLQKHIRTH 

172 7 FNCESQGCSKYFTTLSDLRKHIRTH 

1728 FRCKYDGCGKLYTTAHHLKVHERSH 

1729 HKCPYSGCGKVYGKSSHLKAHYRVH 

173 0 - -CDYNGCTKVYTKSSHLKAHLRTH 
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Q60980_MOUSE 1731 

035738_MOUSE 1732 

Q61596__MOUSE 1733 

O89091_MOUSE 1734 

Q60 843_MOUSE 1735 

EZF_MOUSE 1736 

Q64167_iyiOUSE 1737 

O89090_MOUSE 1738 

O89087_MOUSE 1739 

Q62445_MOUSE 1740 

O702 61_MOUSE 1741 

EKL.F__MOUSE 1742 

WTl_MOUSE 1743 

ZEPl_MOUSE 1744 

Q614 79_MOUSE 1745 

O55140_MOUSE 1746 

Q6063 6_MOUSE 1747 

SNAI_MOUSE 1748 

P97469_MOUSE 1749 

ZIC2_MOUSE 1750 

ZIC3__MOUSE 1751 

Q620 65_MOUSE 1752 

Q62065__MOUSE 1753 

IKAR__MOUSE 1754 

Q9Z2Z2_MOUSE 1755 

HELI_MOUSE 1756 

Q61164_MOUSE 1757 

Q6.1624_MOUSE * 1758 

P974 75_MOUSE 1759 

Z151_MOUSE 1760 

Q62511__MOUSE 1761 

MAZ_MOUSE 1762 

08893 9_MOUSE 1763 

Q64321_MOUSE 1764 

P973 65_MOUSE 1765 

08893 9_MOUSE 1766 

Q64321_MOUSE 1767 

Z151_MOUSE 1768 

Z151_M0USE 1769 

Z151 MOUSE 1770 



HRCDYDGCNKVYTKS SHLKAHRRTH 
HRCDFEGCNKVYTKS SHLKAHRRTH 
HI CSHPGVGKTYFKSSHLKAHVRTH 
H I C SHPGCGKT YFKS SHLKAHVRTH 
HTCS YTNCGKTYTKS SHLKAHLRTH 
HTCD YAGCGKT YTKS SHIjKMILRTH 
HI CHI QGCGKVYGKTSHLRAHLRWH 
HI CHIQGCGKVYGKTSHLRAHLRWH 
HI CHI QGCGKVYGKTSHLRAHLRWH 
HVCHIEGCGKVYGKTSHLRAHLRWH 
HTCGHEGCGKSYTKS SHLKAHLRTH 
HTCGHEGCGKS YS KS SHLKAHLRTH 
FMCAYPGCNKRYFKLSHLQMHSRKH 
YI CEYCNRACAKPS VLLKHIRSH 
YICQYCSRPCAKPSVLQKHIRSH 
YICPYCSRACAKPSVLKKHIRSH 
HECQVCHKRFS STSNLKTHLRLH 
CVCTTCGKAFSRPWLLQGHVRTH 
CVCKICGKAFSRPWLLQGHIRTH 
HVCFWEECPREGKPFKAKYKLVNHIRVH 
HVCYWEECPREGKSFKAKYKLVISIHIRVH 
HECKLCGASFRTKGSLIRHHRRH 
HVCQFCSRGFREKGSLVRHVRHH 
FQCNQCGASFTQKGNLLRHI KLH 
FHCNQCGASFTQKGNLLRHI KLH 
FHCNQCGAS FTQKGNLLRHI KLH 
HKCHLCGRAFRTVTLLRNHLNTH 
HVCEHCNAAFRTNYHLQRHVF I H 
HVCEHCNAAFRTNYHLQRHVFIH 
YVCTHCQRQFADPGGLQRHVRIH 
Y I CEYCARAFKS SHNLAVHRMIH 
Y I CALC AKEFKNGYNLRRHEAI H 
YECNI CKVRFTRQDKLKVHMRKH 

- -CEVCGVRFTRNDKLKIHMRKH 
PHKCEVCGKCFSRKDKLKTHMRCH 
YLCQQCGAAFAHNYDLKNHMRVH 
YSCPHCPARFLHSYDLKNHMHLH 
HKCEDCGKEFTHTGNFKRHIRIH 
YRCGDCGKLFTTSGNLKRHQLVH 

- KCRECGKQFTTSGNLKRHLRIH 



Chicken database. 

35 finger units SEQ ID NO 

Q92010 CHICK 1771 YSCEVCGKSFIRAPDLKKHERVH 
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Q90851_CHICK 1772 

Q90 850_CHICK 1773 

Q90851_CHICK 1774 

Q90850_CHICK 1775 

CTCF_CHICK 1776 

ZKR1__CHICK 1777 

ZKR1__CHICK 1778 

ZKR1_CHICK 1779 

ZKR1_CHICK 1780 

Q92010_CHICK 1781 

O42408_CHICK 1782 

DEFIJOHICK 1783 

O42408_CHICK 1784 

DEFI_CHICK 1785 

O42409_CHICK 1786 

O424 09_CHICK 1787 

ZKR1_CHICK 1788 

ZKR1_CHICK 178 9 

ZKR1_CHICK 1790 

042409_CHICK 1791 

057415_CHICK 1792 

CTCF_CHICK 1793 

057415_CHICK 1794 

Q92010__CHICK 1795 

057415_CHICK 1796 

057415_CHICK 1797 

057415_CHICK 1798 

Q91051_CHICK 1799 

012939_CHICK 1800 

057415_CHICK 1801 

IKAR_CHICK 1802 

CTCF_CHICK 18 03 

093567_CHICK 1804 

093567 CHICK 1805 



Plant Database. 



YPCTICGKKFTQRGTMTRHMRSH 
YPCT I CGKKFTQRGTMTRHMRSH 
- -CDACGMRFTRQYRLTEHMRIH 
- -CDACGMRFTRQYRLTEHMRIH 
HKCPDCDMAFVTSGELVRHRRYKH 

- TCGDCGKGFAWASHIiQRHRRVH 
HRCGDCGKGFAWASHLQRHRRVH 
HRCGDCGKGFVWASHLERHRRVH 
- -CPDCGKSFPWASHLERHRRVH 
- -CHMCDKAFKHKSHLKDHERRH 
HECG I CKKAFKHKHHL I EHMRLH 
HECGI CKKAFKHKHHLIEHMRLH 
FKCTECGKAFKYKHHLKEHLRIH 
FKCTECGKAFKYKHHIiKEHLRIH 
YPCQYCGKRFHQKSDMKKHTYIH 
FECKMCGKTFKRSSTLSTHLLIH 
YECPECGEAFSQGSHIiTKHRRSH 
YECPECGEAFSQGSHLTKHRRSH 
YSCPECGESYSQSSHLVQHRRTH 
HKCQVCGKAFSQS SNL ITHSRKH 
YQCNI CDY I AADKAAL I RHLRTH 
FQCSLCSYASRDTYKIiKRHMRTH 
YKCQTCERTFTLKHSLVRHQRIH 
FVCEMCTKGFTTQAHLKEHLKIH 

- TCPYCPRVFSWAS SLQRHMLiTH 
HSCS I CGKSLS S AS SLDRHMLVH 

- -CTVCNKRFWSLQDLTRHMRSH 
CVCKICGKAFSRPWLLQGHIRTH 
CVCKMCGKAFSRPWLLQGHIRTH 
YKCSVCGQSFTTNGNMHRHMKIH 
FQCNQCGASFTQKGNLLRHIKLH 
HKCHLCGRAFRTVTLLRNHUSTTH 
YECNI Clsr\n[lFTRQDKLKVHMRKH 
YLCQQCGAAFAHNYDLKNHMRVH 



52 finger units 



SEQIDNO 
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O22089_ 
O22088] 
O22087] 
Q39092_ 
Q3 9217" 
P93713' 
022086 
022085* 
O22084 
Q42453' 
Q42410* 
O65150" 
Q40897 
Q40896 
Q42430 
Q40899 
P93166 
Q96289_ 
Q42423__ 
022533_^ 
Q40898_ 
Q38895_ 
02 3 621_ 
O80942_ 
P93714_ 
043614^ 
O22083_ 
Q41070_ 
Q42375_ 
065499^ 
O22090] 
O22082_ 
P93717^ 
O04177" 
004176^^ 
P93715~ 
Q39092] 
P93719] 
P93718] 
O2209l] 
Q4243 0^ 
O04177' 
O04176' 
Q96289' 
Q42423* 
Q40897' 
Q40896 
Q40898 
065150 
P93166 
Q40899 
022533 



PETHY 
PETHY 
PETHY 
ARATH 

[pETHY 
^PETHY 
"pETHY 
^PETHY 

]arath 
"arath 
]tobac 

]PETHY 
]^PETHY 
WHEAT 
^PETHY 
^SOYBN 
_ARATH 
_ARATH 
_ARATH 
_PETHY 
_AELA.TH 
_ARATH 
_ARATH 
_PETHY 
_PETHY 
_PETHY 
_PEA 
_ARATH 
_ARATH 
_PETHY 
_PETHY 
_PETHY 
_BRARA 
BRAE^ 
PETHY 
ARATH 
[PETJT^ 
[pETHY 
[pETHY 
^WHEAT 
[^BRARA 
^BRARA 
*ARATH 
__ARATH 
"pETHY 
"pETHY 
__PETHY 

"tobac 

"SOYBN 
_PETHY 
ARATH 



1806 

1807 

1808 

1809 

1810 

1811 

1812 

1813 

1814 

1815 

1816 

1817 

1818 

1819 

1820 

1821 

1822 

1823 

1824 

1825 

1826 

1827 

1828 

1829 

1830 

1831 

1832 

1833 

1834 

1835 

1836 

1837 

1838 

1839 

1840 

1841 

1842 

1843 

1844 

1845 

1846 

1847 

1848 

1849 

1850 

1851 

1852 

1853 

1854 

1855 

1856 

1857 



HECSICGEQFLLGQALGGHMRKH 

HECSFCGEDFPTGQALGGHMRKH 

-ECSFCGEDFPTGQALGGHMRKH 

HKCKLCWKSFANGRALGGHMRSH 

HKCSICSQSFGTGQALGGHMRRH 

HECS I CGLEFAIGQALGGHMRRH 

HECSICGLEFPIGQAIiGGHMRRH 

HECSICGMEFSLGQALiGGHMRRH 

HECS ICGMEFSLGQAIiGGHMRRH 

HPCPICGVKFPMGQALGGHMRRH 

HPCP ICGVEFPMGQAIiGGHMRRH 

HVCS I CHKAFPTGQALGGHKRRH 

HVCSICHKAFPTGQALGGHKRRH 

HVCSICHKAFPSGQ2UiGGHKRRH 

HRCSICQKEFPTGQAIiGGHKRKH 

HECSICHKCFPTGQAIiGGHKRCH 

HECSICHKSFPTGQALGGHKRCH 

HVCTI CNKS FPSGQALGGHKRCH 

HVCTICNKSFPSGQALGGHKRCH 

HVCS I CHKS FATGQALGGHKRCH 

HECSICHKCFSSGQALGGHKRRH 

YTCSFCKREFRSAQALGGHMNVH 

YTCNFCRREFRSAQAXiGGHMNVH 

YTCSFCRREFKSAQALGGHMNVH 

HECSYCGAEFTSGQALGGHMRIUi 

HECAICGAEFTSGQALGGHMRRH 

HECSICGAEFTSGQAIiGGHMRRH 

HECS ICGAEFTSGQAIiGGHMRRH 

HECS ICGSEFTSGQALGGHMRRH 

HKCNICFRVFSSGQALiGGHMRCH 

HECPVCFRVFSSGQALGGHKRTH 

HECPVCYRVFSSGQALGGHKRSH 

HECS ICHRVFSTGQAIiGGHKRCH 

HTCSICFKSFSSGQALGGHKRCH 

HTCSICFKSFSSGQAIiGGHKRCH 

HQCS I CHRVFS SGQAIiGGHKRCH 

HECPICAKVFTSGQALGGHKRSH 

HECPYCDRVFKSGQALGGHKRSH 

HACPFCPRMFKSGQALGGHKRSH 

YECPLCFKIFQSGQALGGHKRSH 

- KCS VCGKS FS S YQALGGHKTSH 

YKCTVCGKSFSSYQALGGHKTSH 

YKCTVCGKSFSSYQALGGHKTSH 

YKCSVCDKTFS S YQALGGHKASH 

YKCSVCDKTFSSYQAIiGGHKASH 

YKCSVCDKSFSSYQAIiGGHKASH 

YKCSVCDKS FS S YQALGGHKASH 

YKCNVCNKSFHSYQAXiGGHKASH 

YKCSVCDKAFSSYQALGGHKASH 

YKCSVCDKSFPSYQALGGHKASH 

YKCSVCGKGFGSYQALGGHKASH 

YKCSVCDKAFS S YQALGGHKASH 



Arabidopsis Database 

SEQIDNO 
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Q9ZU64/169-191 


1858 


YTCPKCMS I FDTSQKFAAHMS SH 


O23621/40-62 


1859 


YTCNFCRRE FRS AQALGGHMNVH 


O23504/5-27 


1860 


HKCKLCS KS FCNGRALiGGHMKSH 


Q9SYC5/250-275 


1861 


WYCSCGSDFKHKRSLKDHVKAFGNGH 


Q9SYC5/224-246 


1862 


FACRMCGKAFAVKGDWRTHEKNC 


022533/89-111 


1863 


YKCSVCDKAFS S YQALGGHKASH 


022533/148-170 


1864 


HVCS I CHKSFATGQALGGHKRCH 


Q9SN24/149-171 


1865 


HNCS I CFKSFPSGQALGGHKRCH 


Q9SN24/94-116 


1866 


YKCSVCGKSPPSYQALGGHKTSH 


Q9STI7/117-140 


1867 


YFCGVCDRRFYTNEKLINHFKQIH 


Q9STM3/12 96-132 0 


1868 


LKCPWKGCKMTFKWAWSRTEHIRVH 


Q9STM3/1243-1268 


1869 


YQCNMEGCTMSFSSEKQLMLHKRNIC 


Q9STM3/1271-1290 


1870 


KGCGKNFFSHKYLVQHQRVH 


Q9STM3/1326-1352 


1871 


YVCAEPDCGQTFRFVSDFSRHKRKTGH 


Q9STM3/1296-132 0 


1872 


LKCPWKGCKMTFKWAWSRTEHIRVH 


081801/142-164 


1873 


PMCNVCGKGFASWKAVFGHLRQH 


065601/61-83 


1874 


QKCEKCSREFCSPVNFRRHNRMH 


Q9SFY6/118-140 


1875 


YKCSVCDKTFSS YQALGGHKASH 


Q9SFY6/174-196 


1876 


HVCTICNKSFPSGQALGGHKRCH 


065245/147-171 


1877 


FYCELCSKQYRTVMEFEGHLSSYDH 


Q39261/52-74 


1878 


FSCNYCQRKFYS SQALGGHQNAH 


Q9SSW0/118-140 


1879 


HVCSVCGKSFATGQALGGHKRCH 


Q9SSW0/75-97 


1880 


YKCGVCYKTFSSYQALGGHKASH 


Q39262/61-83 


1881 


FS CNYCQRTF YS S QALGGHQNAH 


Q9SSW1/164-186 


1882 


HTCS I CFKSFASGQALGGHKRCH 


Q9SSW1/97-119 


1883 


YKCTVCGKSFS S YQALGGHKTSH 


Q9ZPT0/145-167 


1884 


WVCERCSKGYAVQSDYKAHLKTC 


Q9ZPT0/67-89 


1885 


YI CE I CNQGFQRDQNLQMHRRRH 


Q9ZPT0/172-193 


1886 


HSCDCGRVFSRVESFIEHQDNC 


Q39263/85-107 


1887 


FSCNYCQRKFYSSQALGGHQNAH 


Q9SGD1/291-316 


1888 


WYCTCGSDFKHKRSLKDHIRSFGSGH 


Q9SGD1/265-287 


1889 


FS CGKCGKALAVKGDWRTHEKNC 


Q9SGD1/180-202 


1890 


FACS I CS KTFNR YNNMQMHMWGH 


Q9SSW2/106-128 


1891 


YKCNVCEKAFPSYQALGGHKASH 


Q9SSW2/165-187 


1892 


HECS I CHKVFPTGQALGGHKRCH 


Q39264/60-82 


1893 


HECQYCGKEFANSQALGGHQNAH 


P93815/7-30 


1894 


QECAVCKRVFLSSHQLI SHYNAAH 


Q39265/41-63 


1895 


YECQYCCREFANSQALGGHQNAH 


Q39266/59-81 


1896 


FSCNYCRRKFYSSQALGGHQNAH 


Q39267/93-115 


1897 


FECHYCFRNF PTS QALGGHQNAH 


Q9SVY1/301-323 


1898 


FMCRKCGKAFAVRGDWRTHEKNC 


Q9SVY1/217-239 


1899 


FSCPVCFKTFNRYNNMQMHMWGH 


Q9SGH2/1804-1827 


1900 


IHCLI CHKTFASDDEFEDHTESKC 


Q38895/47-69 


1901 


YTCSFCKREFRSAQALGGHMNVH 


Q9SIiB8/49-71 


1902 


YTCSFCRREFRSAQALGGHMNVH 


Q9SL35/188-210 


1903 


HECSICGSEFTSGQALGGHMRRH 


Q9SL35/113-135 


1904 


YECKTCNRTFS S FQALGGHRASH 
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081013/49-71 


1905 


HFCVICEKQFSSGKAYGGHVRIH 


081013/119-141 


1906 


IRCCLCGKEFQTMHSLFGHMRRH 


023395/664-686 


1907 


LHCEKCGKALQPTEMEKHLKVFH 


Q9SI97/34-56 


1908 


FACKTCNKEFPSFQALGGHRASH 


Q9SI97/78-100 


1909 


HECPI CGAEFAVGQALGGHMRKH 


Q9SR34/35-57 


1910 


YVCSFC IRGFSNAQALGGHMNIH 


Q42485/68~90 


1911 


F S CNYCQRKF YS SQALGGHQNAH 


082389/126-149 


1912 


FPCNSCGEIFPKINLLENHIAIKH 


Q9SQX8/182-204 


1913 


YQCKTCDRTFPSFQALGGHRASH 


Q9SQX8/261-283 


1914 


HECGI CGAEFTSGQALGGHMRRH 


065499/222-244 


1915 


HKCNICFRVFSSGQALGGHMRCH 


065499/77-99 


1916 


RPCTECGRKFWSWKALFGHMRCH 


065499/162-184 


1917 


FECGGCKKVFGSHQALGGHRASH 


Q9SCM4/220-243 


1918 


DVCPKCSRGFRDPVDLLKHIDKDH 


Q96289/80-102 


1919 


YKCSVCDKTFSSYQALGGHKASH 


Q96289/136-158 


1920 


HVCTICNKSFPSGQALGGHKRCH 


Q9SCQ6/139-161 


1921 


WKCDKCSKKYAVQSDWKAHSKI C 


Q9SCQ6/166-187 


1922 


YKCDCGTIjFSRRDS FI THRAFC 


Q9SCQ6/63-85 


1923 


FVCEICNKGFQRDQNLQLHRRGH 


Q9SFS1/70-92 


1924 


YVCE I CNQGFQRDQNLQMHRRRH 


Q9SFS1/148-170 


1925 


WI CERCSKGYAVQSDYKAHLKTC 


Q9SFS1/175-196 


1926 


HSCDCGRVFSRVESFIEHQDTC 


Q9SSA6/575-598 


1927 


I HCL I CHKTFAS DD E FEDHTE S KG 


Q42410/39-61 


1928 


FTCKTCIiKQFHSFQALGGHRASH 


Q42410/82-104 


1929 


HPCPI CGVEFPMGQALGGHMRRH 


Q9XFP6/12-35 


1930 


VWCYYCDREFDDEKI LVQHQKAKH 


Q9XFP6/36-59 


1931 


FKCHVCHKKLSTASGMVIHVIiQVH 


022238/218-241 


1932 


VS CGS CKKTFNS GNALE SHNKAKH 


Q42453/40-62 


1933 


FRCKTCLKEFS SFQALGGHRASH 


Q42453/86-108 


1934 


HPCP I CGVKFPMGQAIjGGHMRRH 


Q42375/113-135 


1935 


YECKTCNRTFS S FQALGGHRASH 


Q42375/188-210 


1936 


HECS I CGSEFTSGQALGGHMRRH 


022759/159-181 


1937 


WKCEKCSKFYAVQSDWKAHTKI C 


022759/186-207 


1938 


YRCDCGTLFSRKDTF I THRAFC 


022759/82-104 


1939 


FVCE I CNKGFQRDQNLQLHRRGH 


Q9ZUL3/81-103 


1940 


FI CEVCNKGFQREQNLQLHRRGH 


Q9ZUL3/157-179 


1941 


WKCDKCSKRYAVQSDWKAHSKTC 


Q9ZUL3/184-205 


1942 


YRCDCGTLFSRRDSFITHRAFC 


P93751/95-117 


1943 


FECHYCFRNFPTSQALGGHQNAH 


081827/196-219 


1944 


VSCHKCGEKFSKLEAAEAHHLTKH 


Q9ZUL4/82-104 


1945 


WKCEKCSKRYAVQSDWKAHSKTC 


Q9ZUL4/109-130 


1946 


YRCDCGTIFSRRDSYITHRAFC 


Q9ZUL4/6-28 


1947 


F I CDVCNKGFQREQNLQLHRRGH 


Q9SHD0/194-216 


1948 


FKCETCGKVFKSYQALGGHRASH 


Q9SHD0/243-265 . 


1949 


HECP I CFRVFTSGQALGGHKRSH 


Q9SHD0/4-26 


1950 


YKCRFCFKSFINGRALGGHMRSH 


064936/131-153 


1951 


YQCNVCGRELPSYQALGGHKASH 



wo 02/099084 



PCT/US02/22272 



' 113 



O64936/179-201 


1952 


HKCS ICHREFSTGHSLGGHKRLH 


Q9SIJ0/65-87 


1953 


RPCTECGKQFGSLKALFGHMRCH 


Q9SIJ0/148-170 


1954 


FECDGCKKVFGSHQALGGHRATH 


Q9SIJ0/211-233 


1955 


HRCNICSRVFSSGQALGGHMRCH 


Q9SLD4/47-69 


1956 


FECKTCNKRF S S FQALGGHRASH 


Q9SLD4/94-116 


1957 


HKCS I CSQSFGTGQALGGHMRRH 


Q9ZU93/121-143 


1958 


FEC P I CKNPFTSEEEVS VHVE SC 


Q9SFT3/177-200 


1959 


CACPQCGEVFPKLESLEHHQAVRH 


Q9ZQE0/244-266 


1960 


YTCPKCNGVFNTSQKFAAHMS SH 


Q42423/80-102 


1961 


YKCSVCDKTFSSYQAliGGHKASH 


Q42423/136-158 


1962 


HVCTICNKSPPSGQALGGHKRCH 


Q9ZWA6/146-168 


1963 


WKCEKCAKRYAVQSDWKAHSKTC 


Q9ZWA6/173-194 


1964 


YRCDCGTIFSRRDSFITHRAFC 


Q9ZWA6/70-92 


1965 


FLCE I CGKGFQRDQMIiQLHRRGH 


080942/39-61 


1966 


YTCSFCRREFKSAQALGGHMNVH 


Q39217/90-112 


1967 


HKCS I CSQSFGTGQALGGHMRRH 


Q39217/43-65 


1968 


FECKTCNKRFS S FQALGGHRASH 


Q39092/160-182 


1969 


FECETCEKVFKSYQALGGHRASH 


Q39092/209-231 


1970 


HECPI CAKVFTSGQALGGHKRSH 


Q39092/5-27 


1971 


HKCKLCWKSFANGRALGGHMRSH 


081793/138-160 


1972 


PVCHI CGRGFGSWKAVFGHMRAH 


064828/530-553 


1973 


LQCI PCGSHFGDKEQLLVHVQAVH 


064828/599-622 


1974 


FVCKFCGLKFNLLPDLGRHHQAEH 


064828/496-519 


1975 


F AC AI CLD S FVRRKLLE I HVEERH 


049591/251-278 


1976 


FMCLYCNELCRPFSSLEAVRKHMEAKSH 


049591/26-50 


1977 


LTCNACNMEFKDEEERNLHYKSDWH 


049591/90-114 


1978 


YTCAI CAKGYRSSKAHEQHLQSRSH 



There follow several examples of how to construct and select DNA-binding sub-don[iaiiis 
from libraries of natural zinc fingers. 

5 

Example 4: Human Zinc Finger Module 'Mini-Library \ 

As a preliminary test of the efficacy of using natural zinc finger modules for constmcting 
novel DNA-binding domains, a *mini-library' of natural, human zinc finger modules is 
10 generated. The mini-library comprises 8 zinc finger modules, which have the following 
nomenclature assigned to them in the human genome database: Zif268 finger 1, Zif268 
finger 2, Spl finger 3, WTl finger 1, 015391, 075626, ZN45 and Z165. Since there is 
more than one zinc finger module belonging to the zinc fingers proteins ZN45 and Z165, 
we have called the selected modules ZN45-(AAA) and Z165-(GCC) respectively. 



wo 02/099084 



PCT/US02/22272 



114 

according to their predicted binding site. We have also predicted the binding sites for the 
zinc fingers 015391 and 075626. The preferred binding sites for Zi£268 finger 1, Zif268 
finger 2, Spl finger 3 and WTl finger 1 are akeady known. The amino acid sequences of 
each of the stated modules, and their predicted or previously determined binding 
5 sequences are shown in Table 3. 

Two 3-zinc fmger peptide libraries are prepared, containing the 8 zinc fmger modules 
stated. All novel 3 -finger peptides contain a leader sequence, MAEERP (SEQ ID 
NO: 16), at the start of the peptide and are tagged by the sequence 

10 LRQKDGGGSYPYDVPDYA(SEQIDNO:1989)attheC-terminus. This sequence 

provides: (in the absence of a fijrther C-terminal finger) a suitable terminus to the fmal a- 
helix of the peptide -LRQKD- (SEQ ID NO: 1987) as found in wild-type Zi£268; a short, 
flexible linker sequence, GGGS (SEQ ID NO:2121); and an HA-tag (YPYDVPDYA 
(SEQ ID NO:2122)), which is recognised by the HA-anlibody. Adjacent zinc finger 

1 5 modules are fiised using the linker peptide sequence TGEKP (SEQ ID NO:3). The 
peptide sequences described above are also displayed in Table 3. 

In the first library (library 1), the 8 zinc finger modules are recombined in random order 
to create 3-fmger peptides with all possible combinations of the 8 zinc finger modules. 

20 Such a procedure results in a library diversity of 5 12 (=8^), comprising peptides that are 
predicted to bind to any possible combination of the binding sites assigned in Table 3. 
Library 1 allows novel 3-finger domains to be selected as a unit, for specified 9 bp target 
sequences. Such 3-finger units may be used for the construction of poly-zinc finger 
peptides as described in Moore, M., Choo, Y. & Klug, A. (2001) Proc. Natl Acad. Sci. 

25 USA 98: 1432-1436; and WO 01/53480. 

In the second library Oibraiy 2), the 8 zinc finger modules are randomly recombined to 
create 2-finger peptides which are all joined to the C-terminus of Zif268 finger 1. The 
invariant finger 1 acts as an anchor for the selection, both by providing extra affinity to 
30 stabilise the selection, and by fixing the register of the protein DNA interaction (as 

discussed supra). Such a Ubrary has a diversity of 64 (=8^), and allows novel 2-finger 
vmits to be selected for a given 6 bp target sequence. The resulting 2 finger units can be 



wo 02/099084 



PCT/US02/22272 



115 

recovered by PGR and used in tlie construction of poly-zinc finger peptides (based on 
strings of 2-finger units), as described in WO 01/53480. 

These two libraries (encoding 3 -finger peptides) are screened, as described below, for the 
ability of their encoded proteins to bind three different 9 bp binding sequences: 5'-GCG- 
TGG-GCG-3'; 5'-GGA-TAA-GCG-3'; and 5'-GCC-GAG-TGG-3\ 

As positive controls, the genes encoding the 3-finger peptides predicted to bind the above 
target sequences are specifically constructed and tested in a similar manner. 



X 


FINGER/UNIT 


SEQ ID NO: 


PEPTIDE SEQUENCE 


SITE 


1 


ZIF268 Fl 


1979 


YACPVESCDRRFSRSDELTRHIRIH 


GCG 


2 


ZIF26S F2 


1980 


FQCRICMRNFSRSDHLSTHIRTH 


TGG 


3 


Spl F3 


1981 


FSCPICEKRFMRSDHLTKHARRH 


GGG 


4 


WTl Fl 


1982 


FMCAYPGCNKRYFKLSHLQMHSRKH 


GAG 


5 


015391 


1983 


FVCPFDVCNRKFAQSTNLiKTHILTH 


TAA^ 


6 


075626 


1984 


FKCQTCNKGFTQLAHLQKHYLVH 


GGA^ 


7 


ZN45-AAA . 


1985 


YKCEECGKGFSQASNLLAHQRGH 


AAA^ 


8 


Z165-GCC 


1986 


YECNECGKSFAESSDLTRHRRIH 


GCC^ 


9 


leader 


16 


MAEERP 




10 


linker 


3 


TGEKP 




11 


GsS-HA-tag 


1989 


LRQKDGGGSYPYDVPDYA* 





^Predicted binding site. * indicates a translation stop codon. 



Table 3. Nomenclature, amino acid sequences and known or predicted binding sequences 
("SITE") of zinc finger modules and other peptide units used in library construction. 

a. Human Zinc Finger Mini-Library Construction. 

Two libraries are prepared, according to the scheme shown in Figure 2. The N- terminal 
finger of the 3-finger construct is referred to as 'cassette A\ The central finger is 
encoded by cassette B, and the third (C-temiinal) finger module is called cassette C. 

Zinc Finger Cassettes 

Polynucleotide sequences encoding the amino acid sequences of the 8 zinc finger 
modules shown in Table 3 are determined, taking into account E, coli codon preferences. 
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and the corresponding nucleotide sequences are synthesised as single stranded 
oligonucleotides, examples of which are shown in Table 4. Also shown are the sequences 
of exemplary linkers and an exemplary 3'-tag required for the assembly of 3-finger 
domains. Double stranded cassettes encoding the zinc finger modules and relevant 
5 leader, linker, and terminator sequences are generated by PGR according to the procedure 
described below, using the appropriate ohgonucleotide templates of Table 4, and primers 
of Table 5. 



X 


CODE 


FINGER 


SEO ID NO NUCLEOTIDE SEQUENCE 


1 


AS 144 


ZIF268 Fl 


1990 


TATGCGTGCCCGGTGGAAAGCTGCGATCGTCGTTTTAG 


L 


A CI AS 




1991 


TTTCAGTGCCGTATTTGCATGCGTAACTTTAGCCGTAG 
CGATCATCTGAGCACCCATATTCGTACCCAT 


3 


AS148 


Spl F3 


1992 


TTTAGCTGCCCGATTTGCGAAAAACGTTTTATGCGTAG 
CGATCATCTGACCAAACATGCGCGTCGTCAT 


4 


AS 149 


WTl Fl 


1993 


TTTATGTGCGCGTATCCGGGCTGCAACAAACGTTATTT 
TAAACTGAGCCATCTGCAGatgCATAGCCGTAAACAT 


5 


AS 150 


015391 


1994 


TTTGTGTGCCCGTTTGATGTGTGCAACCGTAAATTTGC 
GCAGAGCACCAACCTGAAAACCCATATTCTGACCCAT 


6 


AS151 


075626 


1995 


TTTAAATGCCAGACCTGCAACAAAGGCTTTACCCAGCT 
GGCGCATCTGCAGAAACATTATCTGGTGCAT 


7 


AS 152 


ZN45- 
AAA 


1996 


TATAAATGCGAAGAATGCGGCAAAGGCTTTAGCCAGGC 
GAGCAACCTGCTGGCGCATCAGCGTGGCCAT 


8 


AS 153 


Z165-GCC 


1997 


TATGAATGCAACGAATGCGGCAAAAGCTTTGCGGAAAG 
CAGCGATCTGAGCCGTCATCGTCGTATTCAT 


9 




MAEERP 
leader 


1998 


ATGGCGGAAGAACGTCCG 


10 




TGEKP 
linker 


1999 


ACCGGCGAAAAACCG 


11 




G3S-HA- 
tag (tag) 


2000 


CATCTGCGCCAGAAGGACGGCGGCGGCAGCTATCCGTA 
TGATGTGCCGGATTATGCGTAA 



10 Table 4. Nucleotide sequences encoding zinc finger modules and other peptide sequences 
used in the construction of 3-finger proteins. 



X 


CODE 


NAME 


SEQ ID NO 


SEQUENCE 


1 


ASS 


pETFwdl 


2001 


CGCTGACTTCCGCGTTTCC 


2 


AS86 


SDRev 


2002 


ATGTATATCTCCTTCTTAAAGTT 


3 


AS93 


ZnFlFwd 


2003 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTATGCGTGCCCGGTGQAAAG 


4 


AS94 


ZnF2Fwd 


2004 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTCAGTGCCGTATTTGCATG 
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5 


AS95 


ZnF3Fwd 


2005 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTAGCTGCCCGATTTGCG 


6 


AS96 


ZnF4Fwd 


2006 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTATGTGCGCGTATCCGGG 


7 


AS97 


ZnFSFwd 


2007 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTATGTGCGCGTATCCGGG 


8 


AS98 


ZnF6Fwd 


2008 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTAAATGCCAGACCTGCAAC 


9 


AS99 


ZnF7Fwd 


2009 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTATAAATGCGAAGAATGCGGC 


10 


AS 100 


ZnFSFwd 


2010 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 

CGTCCGTATGAATGCAACGAATGCGGC 


11 


ASlOl 


ILinklRev 


2011 


CGGTTTTTCGCCGGTATGAATACGAATATGACGGG 


12 


AS 102 


lLiiik2Rev 


2012 


CGGTTTTTCGCCGGTATGGGTACGAATATGGGTGC 


13 


AS103 


lLink3Rev 


2013 


CGGTTTTTCGCCGGTATGACGACGCGCATGTTTGG 


14 


AS 104 


lLink4Rev 


2014 


CGGTTTTTCGCCGGTATGTTTACGGCTATGCATCT 
G 


15 


AS 105 


lLink5Rev 


2015 


CGGTTTTTCGCCGGTATGGGTCAGAATATGGGTTT 
TC 


16 


AS106 


lLink6Rev 


2016 


CGGTTTTTCGCCGGTATGCACCAGATAATGTTTCT 
GC 


17 


AS 107 


lLink7Rev 


2017 


CGGTTTTTCGCCGGTATGGCCACGCTGATGCGC 


18 


AS 108 


ILinkSRev 


2018 


CGGTTTTTCGCCGGTATGAATACGACGATGACGGG 


19 


AS 109 


ILinklFwd 


2019 


CATACCGGCGAAAAACCGTATGCGTGCCCGGTGGA 
AAG 


10 


ASllO 


lLiiik2Fwd 


2020 


CATACCGGCGAAAAACCGTTTCAGTGCCGTATTTG 

CATG 


11 


ASUl 


ILinkSFwd 


2021 


CATACCGGCGAAAAACCGTTTAGCTGCCCGATTTG 
CG 


12 


AS112 


lLink4Fwd 


2022 


CATACCGGCGAAAAACCGTTTATGTGCGCGTATCC 

GGG 


13 


AS113 


ILinkSFwd 


2023 


CATACCGGCGAAAAACCGTTTGTGTGCCCGTTTGA 
TGTG 


14 


AS 114 


lLink6Fwd 


2024 


CATACCGGCGAAAAACCGTTTAAATGCCAGACCTG 

CAAC 


15 


AS115 


lLink7Fwd 


2025 


CATACCGGCGAAAAACCGTATAAATGCGAAGAATG 
CGGC 


16 


AS116 


ILinkSFwd 


2026 


CATACCGGCGAAAAACCGTATGAATGCAACGAATG 

CGGC 


17 


AS117 


2LinklRev 


2027 


TGGCTTCTCACCCGTGTGATGAATACGAATATGAC 
GGGTC 


18 


AS118 


2Link2Rev 


2028 


TGGCTTCTCACCCGTGTGATGGGTACGAATATGGG 
TGC 


19 


AS119 


2Link3Rev 


2029 


TGGCTTCTCACCCGTGTGATGACGACGCGCATGTT 
TGG 


20 


AS120 


2Link4Rev 


2030 


TGGCTTCTCACCCGTGTGATGTTTACGGCTATGCA 
TCTG 
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|21 p 


^S121 2 


XinkSRev j 


2031 1 


^GGCTTCTCACCCGTGTGATGGGTCAGAATATGGG | 
?TTTC 


22 b 


^S122 2 


Xinlc6Rev 


2032 1 

■] 


:ggcttctcacccgtgtgatgcaccagataatgtt 

rcTGc 


1 23 \j 


5^8123 


>Link7Rev 


2033 ■] 

C 


[•ggcttctcacccgtgtgatggccacgctgatgcg I 

1 1 


1 24 L 


^S124 ^ 


>Link8Rev 


2034 

( 


[•GGCTTCTCACCCGTGTGATGAATACGACGATGAC 
5GG 


1 ?s L 


\si25 


ZLinklFwd 


2035 ( 
( 


IR.CGGGTGAGAAGCCATATGCGTGCCCGGTGGAAA 1 
3" 1 


1 9^ 




2Liiik2Fwd 


2036 < 

r 


::acgggtgagaagccatttcagtgccgtatttgca 
rG 


1 97 

I t'' 




2Lirik:3Fw(i 


2037 < 


::acgggtgagaagccatttagctgcccgatttgcg | 


28 L 


A.S128 


2Link4Fwd 


2038 


cacgggtgagaagccatttatgtgcgcgtatccgg 
g 


29 


AS 129 


2Link5Fwd 


2039 


cacgggtgagaagccatttgtgtgcccgtttgatg I 

tg 


1 I 
1 1 




9T inV^iFwd 


2040 


cacgggtgagaagccatttaaatgccagacctgca I 

AC 


31 


AS131 


2Link7Fwd 


2041 


cacgggtgagaagccatataaatgcgaagaatgcg 

GC 


32' 


AS 132 


2Link8Fwd 


2042 


cacgggtgagaagccatatgaatgcaacgaatgcg 

GC 


1 33 


A0I33 


'XVl All? f»v 


2043 


ctaggaattcttacgcataatccggcacatcatac 
ggatagctgccgccgccgtccttctggcgcagatg 
aatacgaatatgacgggtc 


1 


1 A C 1 '^A 




2044 


ctaggaattcttacgcataatccggcacatcatac 
ggatagctgccgccgccgtccttctggcgcagatg 
ggtacgaatatgggtgc 


1 3!) 


1 A CI '^^ 


jXXrvJJx.CV 


2045 


ctaggaattcttacgcataatccggcacatcatac 
ggatagctgccgccgccgtccttctggcgcagatg 

ACGACGCGCATGTTTGG 


1 36 


1 A C 1 
Ao i3o 




2046 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
TTTACGGCTATGCATCTG 


1 3 / 






2047 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
GGTCAGAATATGGGTTTTC 


38 


AS138 


3HA6Rev 


2048 


ctaggaattcttacgcataatccggcacatcatac 
ggatagctgccgccgccgtccttctggcgcagatg 
caccagataatgtttctgc 


39 


AS13S 


1 3HA7Rev 


2049 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 

GCCACGCTGATGCGC 


4C 


) AS14( 


) 3HA8Rev 


2050 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
AATACGACGATGACGGG 



i I 
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41 


AS141 


Rev3 


2051 


CTAGGAATTCTTACGCATAATC 


42 


AS 142 


ILinkRev 


2052 


CGGTTTTTCGCCGGTATG 


43 


ASMS 


2LiiikRev 


2053 


TGGCTTCTCACCCGTGTG 



Table 5. Modifying oligonucleotides used for mini-library construction. 



5 1. Library 1, 

Once made into double stranded DNA cassettes, the finger units are attached to T7 
upstream expression sequences by PGR overlap extension, using tlie following protocol. 

10 (a) Upstream sequences are first extiacted from pET23a by PGR using primers 

pETFwdl and SDRev, generating the firagment pET5'. 

(b) The fingers for cassette A are amplified with forward primers ZnFxFwd 
(AS93-100) and reverse piimers ILinkxRev (AS 101 -AS 108), where x is the number of a 

15 particular finger firom Tables 3 and 4, as indicated. 

(c) The fingers for cassette B are amplified with forward primers ILinkxFwd 
(AS109-116) and reverse primers 2LinkxRev (AS117-AS124), where x refers to the 
finger module number. 

20 

(d) The fingers for cassette G are amplified with forward primers 2LinkxFwd 
(AS125-132) and reverse primers 3HAxRev (AS133-AS140), where x refers to the 
appropriate zinc finger module. 

25 The steps to create cassettes A, B and G are performed separately, however, mixed 

populations of template oUgonucleotides can be added to each PGR of steps (a), (b), and 
(c) to produce a library of each cassette. 

The final 3-finger Ubrary is assembled by overlap extension as outlined in Figure 2. In 
30 the first step the mixed pool of cassette A is appended to the upstream sequences, pET5'. 
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Equimolax amounts are mixed and PCR-cycled in the absence of primers. The reaction 
product is either purified immediately or reamplified before purification using primers 
pETFwdl and ILinlcRev. 

5 In the second step cassette B (mixed pool) is appended to the product of the above step. 
Again, equimolar amounts are mixed and PCR-cycled in the absence of primers. The 
reaction product is either purified immediately or reampUfied before purification using 
primers pETFwdl and 2LinkRev. 

10 In the final step cassette C (mixed pool) is appended to the above product. Equimolar 
amounts are mixed and PCR-cycled in the absence of prmiers. As before, the reaction 
product may be purified immediately or reamplified before purification using primers 
pETFwdl and Rev3. (see, also Figure 2). 

15 2. Library 2. 

Library 2 is assembled in a similar manner to Library 1 except that cassette A is 
represented by Zif268 finger 1 only. 

20 The final PGR products containing T7 promoter sequences and encoding 3-finger 
peptides attached to an HA-antibody tag are purified and used for the production of 
protein. 

25 b. Zinc Finger Library Screening- 

Two exemplary methods for screening zinc finger Hbraries, such as those produced 
above, are described in Protocol A and Protocol B, below. 
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Protocol A: 

The peptides of library 1 and library 2 are screened to select 3-zinc finger domains which 
bind the sequences: 5'-GCG-TGG-GCG-3'; 5'-GGA-TAA-GCG-3'; and 5'-GCC-GAG- 
5 TGG-3'. Since library 2 contains Zif268 finger 1 in the N-terminal position, in theory, 
these peptides should only bind the sequences, 5'-GCG-TGG-GCG-3', and 5'-GGA- 
TAA-GCG-3'. Hence, library 2 is effectively used to select 2-finger units which bind 
strongest to the 6 bp sequences, 5'-GCG-TGG-3', and 5'-GGA-TAA-3\ Double 
stranded binding sites for use in the selection protocol are generated by annealing the 
10 complimentary oligonucleotides: Zif b site and Zif site RC (AS154 and AS155); 

#l#5#6.b and #1#5#6 RC (AS156 and AS157); and #2#4#S.b and #2#4#8 RC (AS158 
and AS 159). The top strand of each binding site is biotinylated, allowing capture of 
binding site/zinc finger/HA-antibody ternary complexes to the streptavidui-coated plate in 
an ELISA screening assay. The oligonucleotides are displayed in Table 6, below. 

15 



X 


Code 


Name 


SEQ ID NO 


Sequence 


1 


AS 154 


Zif-b site 


2054 


TTTTTTTTTTGCGTGGGCGTTTTTTTTTT 


2 


AS 155 


Zif site RC 


2055 


AAAAAA?^AAA.CGCCCACGCAAAAAAAAAA 


3 


AS 156 


#l#5#6.b 


2056 


TTTTTTTTTTGGATAAGCGTTTTTTTTTT 


4 


AS 157 


#1#5#6 RC 


2057 


AAAAAAAAAACGCTTATCCAAAAAAAAAA 


5 


AS158 


#2#4#8.b 


2058 


TTTTTTTTTGCCTGTTGGTTTTTTTTTTT 


6 


AS159 


#2#4#8 RC 


2059 


AAAAAAAAAAACCAACAGGCAAAAAAAAA 



Table 6. Oligonucleotide sequences used to generate double stranded binding sites used 
in the selection procedure. 

20 

The PCR-amplified 3-finger constracts are gel-purified fiom a 1% TAE-agarose gel using 
the Gel Extraction Kit (Qiagen) and quantified based on absorbance at 260 nM. Dilutions 
(in 0.25 mg/ml X DNA) of DNA template encoding for either library 1 or 2 are prepared 
25 at the final total template concentration of 4.2 fM and 1 AM, respectively. At these 

concentrations 1 \x\ of template contains approximately 2500 and 600 molecules of library 
1 or library 2, respectively. At such low concentrations, such samples must be PCR 
amplified to generate enough template for protein expression. Hence, these 1 |al aUquots 
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are taken and added to 1 ml PGR pre-inix, containing primers Rev3 (AS 141) and 
pETFwdZ (primer sequences shown below, see Table 7). The PGR pre-mixes are then 
aliquoted into 96 (or 384) well plates at 10 \d per well, which is the equivalent of 
approximately 25 or 6 molecules of library 1 or library 2 template, respectively. 
Templates are ampUfied using 30 cycles of PGR. After this first round of PGR, 0.5 jxl 
aUquots of PGR product are added to new 10 p.! PGR pre-mixes (in 96 or 384 well 
format), containing nested primers, pETFwd3 and Rev3, and ampUfied for another 30 
cycles. The resultant product is concentrated enough to perform in vitro transcription / 
translation. 

In vitro translation experiments using TNT PGR coupled transcription-translation mix 
(Promega) are assembled according to the manufacturer's instructions. Typically 5 \l\ 
final volume contains 1 nl of each PGR product and 4 ^1 rabbit reticulocyte pre-mix 
(containing 20 nM methionine, 12.5 ^ig/ml % Hind m digest (Roche), 500 ^iM ZnClz 
(Sigma), 0.7 \il H2O, 40 nM PGR-amplified DNA template). Reactions are mcubated at 
30"G for 90 minutes. 50 ]xl PBS bmding buffer containing 0.1 % BSA (Sigma), 0.5% 
Tween 20 (Sigma), 50 ^iM ZnGlz, 10 nM of the appropriate biotinylated binding site, 25 
HU/ml rat 3F10 anti-HA HRP conjugate (Roche) is added to the translation mix and 
incubated for 45 minutes at room temperature. The binding mix is thereafter transferred 
to pre-blocked black streptavidin-coated 8-well strips or 96 / 384 well plates (Roche), and 
the ternary complexes containing 3-finger peptide, biotinylated binding site and anti-HA 
HRP antibody are captured while shaking at 200 rpm for 45 mmutes at room temperature. 
The wells are then washed five times with 100 ^il PBS binding buffer containing 0. 1 % 
BSA (Sigma), 0.5% Tween 20 (Sigma), 50 ZnGl2 to remove unbound components. 
Finally, the retained HRP activity is measured by adding 50 [il QuantaBlu fluorogenic 
HRP substrate (Pierce). Figure 3 demonstrates the capture and detection of target site- 
binding zinc finger peptides using the assay described. Fluorescence is measured on a 
SpectraMax Gemini XS (Molecular Devices) fluorescence microplate reader at 320 nm 
excitation, 433 nm emission and 420 nm cut-off values. 

The wells that give the highest levels of fluorescence are those which contain the highest 
number of, or tightest binding 3-finger peptides. PGR products firom the second PGR 
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amplification stage, corresponding to such samples, are purified firom TAE-agarose gels 
and quantified, as above. Pure PGR products are diluted to approximately 50 molecules 
per |Lil (which is equivalent to approximately 100 aM concentration) in 0.25 mg/ml X 
DNA. As above, 1 samples of template are added to 1 ml PGR pre-mix containing 
5 primers, pETFwd4 and Rev3. 10 |al aliquots are placed in each well of a 96 well plate. 
At this stage, there is (on average) 0.5 template molecules per aliquot. Therefore, 
generally speaking, half of the samples will contain no template and half will contain a 
single template molecule. Samples are then PGR amplified using 30 cycles. Again, 0.5 
id PGR samples are taken firom each well and amplified again by 30 cycles of PGR using 
1 0 the nested primers,, pETFwdS and Rev3 . 1 fj.1 of each of these PGR products is used for 
protein expression, as described above. At this stage, the highest levels of fluorescence 
correspond to the samples containing the tightest binding 3-finger peptides. The PGR 
product encoding such peptides is purified, as before, and can be sequenced to determine 
the protein sequence of the optimal 3 -zinc finger domain for the appropriate bmding site. 

15 

If fijxther rounds of selection are required, PGR amplification can be conducted with the 
nested primers pETFwd6, pETFwd9 and pETFwd?, also shown below (Table 7). 



NAME 


SEQ ID NO 


SEQUENCE 


pETFwdl 


2060 


CGCTGACTTCCGCGTTTCC 


pETFwd2 


2061 


TCCAGACTTTACGAAACACGG 


pETFwdS 


. 2062 


CGAAGACCATTCATGTTGTTGC 


pETFwd4 


2063 


GTCGCAGACGTTTTGCAGC 


pETFwd5 


2064 


GCAGTCGCTTCACGTTCGC 


pETFwd6 


2065 


CGCTCGCGTATCGGTGATTC 


pETFwd9 


2066 


CATTCTGCTAACCAGTAAGGC 


pETFwd? 


2067 


GCCTAGCCGGGTCCTCAAC 



20 Table 7: Primers used for PGR amplification of 3-finger cassettes (as constructed by the 
procedmre of Figure 2) to provide template used in screening zinc finger libraries. 
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Protocol B: 

The peptides of library 2 were screened to select 3-zinc finger domains which bind the 
5 sequences: 5'-GCG-TGG-GCG-3'. and 5'-GGG-AGG-CCT-3'. Double stranded binding 
sites for use in the selection protocol were generated by annealing the complementary 
oUgonucleotides: Zifb site and Zif site RC (AS154 and AS155, shown above), which 
generated the 5'-GCG-TGG-GCG-3' binding site; and the oUgonucleotides 5'- 
TTTTTTTTTTGGGAGGCCTTTTTTTTTTT-3' (SEQ ID NO:2123) and 5'- 
10 AAAAAAAAAAAGGCCTCCCAAAAAAAAAA-3' (SEQ ED NO:2124), which 

generated the 5'-GGG-AGG-CCT-3' binding site. The top strand of each binding site 
. was biotinylated, allowing capture of binding site/zinc finger/HA-antibody ternary 
complexes onto streptavidin-coated plate in an ELISA screening assay. 

15 The 3-finger library 2 constructs were cloned into the multiple cloning site of vector 

pET23a (Novagen), using appropriate restriction sites. This library was then transformed 
into Kcoli and plated out to grow single colonies. 384 colonies (which should represent 
the vast majority of the 64 member Ubrary) were picked into 2xYT media with ampicillin 
and cultures grown at src overnight. Library 2 expression cassettes were recovered 

20 firom bacteria by PGR using primers pETFwdx (where x is 1-7, eg pETFwdl) and Rev3 
as described in Protocol A above. 

In vitro coupled transcription / translation of PGR products was conducted as described 
above, with the difference tliat each of the 384 zinc finger peptides was screened 

25 individually in a well of a 384 well plate. The library was screened against the 5 '-GCG- 
TGG-GCG-3', and 5'-GGG-AGG-CCT-3' binding sites, as detailed in Protocol A. Wells 
that yielded the highest levels of fluoresceace were those which contain the tightest 
binding 3-finger peptides. The ELISA results firom the screen of the 384 samples against 
the 5'-GCG-TGG-GCG-3' site are shown in Figure 4. Six constructs displayed 

30 significant binding to the target site and these are termed C8, G16, 119, 123, J19 and K19 
according to their coordinates on the 384-well plate. Similarly, one construct (BIO) 
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showed strong binding to the 5'-GGG-AGG-CCT-3' target site. PGR products encoding 
the tiglitest binding peptides can be purified, as described supra, and sequenced. 



5 Some of the selected constructs: C8, J19, K19, 123, G16 (which bind the 5'-GCG-TGG- 
GCG-3' site) and BIO (which binds the 5'«GGG-AGG-.CCT-3' site), were selected and 
screened against a range of different binding sites to test their specificity. The sites used 
were: 5'-GCG-TGG-GCG-3'; 5'-CCA-CTC-GGC-3'; 5'-CCT-AGG-GGG-3'; 5'-GGA- 
TAA-GCG-3'; 5'-GGG-AGG-CCT-3'; 5'-GCG-TAA-GGA-3'; and 5'-GCG-GGG- 
10 GGA-3'. The binding assay was conducted as described above. The results (Figure 5) 
show that the selected 3-zinc finger peptides bind preferentially to their target site, in 
comparison to the altemative binding sites tested. 

Example 5: Human Zinc Finger Module Libraries for Rapid Selection of 2-Finger 
15 Units. 

The preferred subunits of a poly-zinc finger construction strategy are in the form of two- 
finger sub-domains. Assuming that there are 1,000 individual natural finger modules, a 
library of all combinations of such zinc finger modules, in 2-finger units, would contain 

20 1,000,000 members. All of the 1,000 natural finger modules would have to be made fi*om 
oligonucleotides, and the expense would be considerable. Furthermore, this figure is 
likely to be an underestimate of the number of natural fingers. Hence, due to the huge 
numbers of natural, human zinc finger modules available, it is advantageous to limit the 
size of the libraries screened, as discussed in the Description. One way in which library 

25 size can be reduced is to limit the library members to zinc finger modules which are 

predicted to bind the desired sequence. For instance, based on the target sites in Example 
1, if 2-finger domains are required to bind the sequence 5'-GCG-TGG-3', an individual 
Ubrary can be constracted from the zinc finger modules predicted to bind the sequences 
5'-GCG-3' and 5'-TGG-3\ Equally, if the sequence 5'-GGA-TAA-3' is to be targeted, 

30 zinc finger modules predicted to bind the sequences and 5'-GGA-3' and 5'-TAA-3' can 
be used. Table 8 shows the natural, human zinc finger modules from Example 1, which 
are predicted to bind the aforementioned 3 bp sequences. 



wo 02/099084 



PCT/US02/22272 



126 



5'-GCG-3' 


5'-TGG-3' 


5'-GGA-3' 


5'-TAA-3' 


Zif268 finger 1 (GCG) 


Zif268 finger 2 (TGG) 


BCL6 (NGA) 


TYYl (NAA) 


Zif268 finger 3 (GCG) 


MAZ finger 2 (TGG) 


075626 (GGA) 


015391 (YAA) 


Spl finger 2 (GCG) 


WTl finger 3 (TGG) 


ZN45 (N"/tA) 


075626 (YAA) 


WTl finger 4 (GCG) 


SP4 (NGG) 


015535 (GNA) 


ZN45 (N'^/tA) 


BTEl (GCG) 


BTEl (NGG) 


Q15776(GNA) 


Z136(TNN) 


043296 (GNG) 


Z136 (TNN) 


060893 (GNA) 




Z174 (GCG, RNA) 


Q 15776 (NGG) 


Z132 (a) (GGA) 


Q15776 (a) (TNA) 


Z202 (GCG, RNA) 


ZN84 (YGG) 


Z132 (b) (GGA) 


Q15776 (b) (TNA) 






Z132 (GGN) 


Z195 (YAA) 






ZN85 (GGA) 


ZN84 (YAA) 








075346 (TAA) 








ZN43 (TAA) 



Table 8. The natural, human zinc finger modules predicted to bind the sequences 
GCG-3', 5'-TGG-3', 5'-GGA-3' and 5'-TAA-3'. 

5 



On the basis of the specificities shown in Table 5, a library of 2-finger units to target the 6 
bp sequence 5'-GCG-TGG-3' has 64 (8x8) members, and a Ubrary to target the sequence 
5'-GGA-TAA-3' has 120 (10x12) members. To screen sample sizes of tliis magnitude 

10 we can construct each 2-finger unit specificaUy (using for example, an 8x8 or 10x12 
matrix arrangement), and assay the samples containing mdividual clones using the 
fluorescent-ELISA protocol of Example 4. Such a procedure can save time in 
comparison to constructing all possible 64 or 120 variants in a random fashion (as a 
library), as described in Example 4, because the number of constructs screened would 

1 5 have to be considerably higher. 

a. Construction of 2-Finger Domains to Bind 5'-GCG-TGG-3' 

A 64 member, 2-finger library is constructed firom the natural, human zinc fmger modules 
20 predicted to bind the sequences 5'-GCG-3' and 5'-TGG-3' (Table 8, columns 1 and 2). 
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The 2-finger libi-ary units are all attached to the C-terminus of Zif268 finger 1, which acts 
as an anchor finger. The construction protocol is different fi-om that described in 
Example 4, as described below. 

5 Zinc Finger Cassettes 

Nucleotide sequences encoding the amino acid sequences of the 16 zinc finger modules 
(Table 8, columns 1 and 2) are determined, taking into account human codon preferences, 
and the corresponding nucleotide sequences are synthesised as single stranded 
10 oUgonucleotides, shown in Table 9. Double stranded cassettes encoding the zinc finger 
modules and flanking linker sequences are generated by PGR using the appropriate 
primers, shown in Table 10. 



X 


FINGER 


SEQ ID 

NO 


NUCLEOTIDE SEQUENCE 


1 


Zif268 Fl 


2068 


TACGCCTGCCCCGTGGAGAGCTGCGACCGCCGCTTCAG 
CCGCAGCGACGAGCTGACCCGCCACATCCGCATCCAC 


2 


Zif268 F3 


2069 


TTCGCCTGCGACATCTGCGGCCGCAAGTTCGCCCGCAG 
CGACGAGCGCAAGCGCCACACCAAGATCCAC 


3 


Spl F2 


2070 


TTCGCCTGCAGCTGGCAGGACTGCAACAAGAAGTTCGC 
CCGCAGCGACGAGCTGGCCCGCCACTACCGCACCCAC 


4 


WTl F4 


2071 


TTCAGCTGCCGCTGGCCCAGCTGCCAGAAGAAGTTCGC 
CCGCAGCGACGAGCTGGTGCGCCACCACAACATGCAC 


5 


BTEl 


2072 


TTCCCCTGCACCTGGCCCGACTGCCTGAAGAAGTTCAG 
CCGCAGCGACGAGCTGACCCGCCACTACCGCACCCAC 


6 


043296 


2073 


•TACGAGTGCGTGGAGTGCGGCAAGGCCTTCACCCGCAT 
GAGCGGCCTGACCCGCCACAAGCGCATCCAC 


7 


Z174 


2074 


TACAAGTGCGACGACTGCGGCAAGAGCTTCACCTGGAA 
CAGCGAGCTGAAGCGCCACAAGCGCGTGCAC 


S 


Z202, 


2075 


TACCGCTGCGACGACTGCGGCAAGCACTTCCGCTGGAC 
CAGCGACCTGGTGCGCCACCAGCGCACCCAC 


9 


Zif268 F2 


2076 


TTCCAGTGCCGCATCTGCATGCGCAACTTCAGCCGCAG 
CGACCACCTGAGCACCCACATCCGCACCCAC 


10 


MAZF2 


2077 


TACAACTGCAGCCACTGCGGCAAGAGCTTCAGCCGCCC 
CGACCACCTGAACAGCCACGTGCGCCAGGTGCAC 


11 


WTl F3 


2078 


TTCCAGTGCAAGACCTGCCAGCGCAAGTTCAGCCGCAG 
CGACCACCTGAAGACCCACACCCGCACCCAC 


12 


Sp4 


2079 


CACAAGTGCCCCTACAGCGGCTGCGGCAAGGTGTACGG 
CAAGAGCAGCCACCTGAAGGCCCACTACCGCGTGCAC 


13 


BTEl 


2080 


CACAAGTGCCCCTACAGCGGCTGCGGCAAGGTGTACGG 
CAAGAGCAGCCACCTGAAGGCCCACTACCGCGTGCAC 
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14 


Z136 


2081 


TTCGAGTGCAAGCGCTGCGGCAAGGCCTTCCGCAGCAG 
CAGCAGCTTCCGCCTGCACGAGCGCACCCAC 


15 


Q15776 


2082 


TACGAGTGCGACGAGTGCGGCAAGACCTTCCGCCGCAG 
CAGCCACCTGATCGGCCACCAGCGCAGCCAC 


16 


ZN84 


2083 


TACGAGTGCGGCGAGTGCGGCAAGGCCTTCAGCCGCAA 
GAGCCACCTGATCAGCCACTGGCGCACCCAC 




ISTA Binding. 



Table 9. Nucleotide sequences of zinc finger modules and nucleotide sequences encoding 
other peptide sequences used in the construction of peptides to bind the sequence 5'- 
5 GCG-TGG-3'. 

The primers used to amplify the N-teiminal finger of the pair (the equivalent of cassette 
B, above) add TGEKP (SEQ ID NO:3) linker sequences, and the restriction site^mal (5'- 

10 CCC-GGG-3') at the 5' end and anAgel site (5'-ACC-GGT-3') at the 3' end. Agel and 
Xmal create compatible ends, but have unique restriction sites. These primers are called 
CasBxFwd and CasBxRev, respectively, where x refera to tibie number of the zinc finger 
module in Table 9. The primers used to amplify the C-terminal finger of the pair (the 
equivalent of cassette C, above) add TGEKP (SEQ ID NO:3) linker sequences, and the 

15 restriction site Xmal at the 5 ' end and a sequence encoding LRQKDGGGS (SEQ ID 
NO:2125), containing a restriction site for BamHl at the 3' end. These primers are 
referred to as CasCxFwd and CasCxRev, respectively. The 16 individual zinc finger 
cassettes are then purified using the QIAquick PGR purification kit (Qiagen). 



Name 


SEQ ID NO 


Sequence 


CasB9Fwd 


2084 


GATCCCCGGGGAGAAGCCCTTCCAGTGCCGCATCTGCAT 


CasBlOFwd 


2085 


GATCCCCGGGGAGAAGCCCTACAACTGCAGCCACTGCGG 


CasBllFwd 


2086 


GATCCCCGGGGAGAAGCCCTTCCAGTGCAAGACCTGCCA 


CasB12Fwd 


2087 


GATCCCCGGGGAGAAGCCCCACAAGTGCCCCTACAGCG 


CasB13Fwd 


2088 


GATCCCCGGGGAGAAGCCCCACAAGTGCCCCTACAGCG 


CasB14Fwd 


2089 


GATCCCCGGGGAGAAGCCCTTCGAGTGCAAGCGCTGCG 


CasBlSFwd 


2090 


GATCCCCGGGGAGAAGCCCTACGAGTGCGACGAGTGCG 


CasB16Fwd 


2091 


GATCCCCGGGGAGAAGCCCTACGAGTGCGGCGAGTGCG 


CasClFwd 


2092 


GATCCCCGGGGAGAAGCCCTACGCCTGCCCCGTGGAG 
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CasC2Fwd 


2093 


GATCCCCGGGGAGAAGCCCTTCGCCTGCGACATCTGCG 


CasC3Fwd 


2094 


GATCCCCGGGGAGAAGCCCTTCGCCTGCAGCTGGCAGG 


CasC4Fwd 


2095 


GATCCCCGGGGAGAAGCCCTTCAGCTGCCGCTGGCCC 


CasCSFwd 


2096 


GATCCCCGGGGAGAAGCCCTTCCCCTGCACCTGGCCC 


CasC6Fwd 


2097 


GATCCCCGGGGAGAAGCCCTACGAGTGCGTGGAGTGCG 


CasCVFwd 


2098 


GATCCCCGGGGAGAAGCCCTACAAGTGCGACGACTGCGG 


CasCSFwd 


2099 


GATCCCCGGGGAGAAGCCCTACCGCTGCGACGACTGCG 


CasB9Rev 


2100 


CTTCTCACCGGTGTGGGTGCGGATGTGGGTG 


CasBlORev 


2101 


CTTCTCACCGGTGTGCACCTGGCGCACGTG 


CasBllRev 


2102 


CTTCTCACCGGTGTGGGTGCGGGTGTGGGT 


CasB12Rev 


2103 


CTTCTCACCGGTGTGCACGCGGTAGTGGGC 


CasB13Rev 


2104 


CTTCTCACCGGTGTGCACGCGGTAGTGGGC 


CasB14Rev 


2105 


CTTCTCACCGGTGTGGGTGCGCTCGTGCAG 


CasBlSRev 


2106 


CTTCTCACCGGTGTGGCTGCGCTGGTGGCC 


CasB16Rev 


2107 


CTTCTCACCGGTGTGGGTGCGCCAGTGGCT 


CasClRev 


2108 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATGC 
GGATGTGGCGG 


CasC2Rev 


2109 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATCT 
TGGTGTGGCGC 


CasCSRev 


2110 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
GGTAGTGGCG 


CasC4Rev 


2111 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGCATGT 
TGTGGTGGCGC 


CasCSRev 


2112 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
GGTAGTGGCG 


CasC6Rev 


2113 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATGC 
GCTTGTGGCGG 


CasC7Rev 


2114 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGCACGC 
GCTTGTGGCG 


CasCSRev 


2115 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
GCTGGTGGCG 
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ScaRev 
GSFwd 



ZiflFwd 



ZiflRev 



pETRevl 



2116 
2117 



2118 



2119 



2120 



GTCATGCCATCCGTAAGATGC 
GGCGGATCCTATCCGTATGATGTG 



AGAGAGAGAGAGATCTATGGCGGAAGAACGTCCGTATGC 
GTGCCCGGTGGAAAG 



AGCCGGATCCCAAACACCG6TATGAATAC6AATATGACG 
G6 



AGTGTAGCGGTCACGCTGC 



Table 10. Oligonucleotides used for PGR construction of rapid zinc finger library. 
Annealing sequences are shown in bold, restriction sites are underlined. 



5 3-Finger Library Peptides 

The 2 natural zinc finger modules for each construct are appended to the C-terminus of 
ZiC68 finger 1 (as in Example 4, Ubrary 2). Hence, a plasmid construct containing 
Zi£268 finger 1 and appropriate restriction sites for cloning of the two natural finger 
10 modules is also prepared. The construction and cloning procedure for the 3-finger 
libraries follows (see also Figure 6). 

(a) The plasmid pET23a/TZF-HA was assembled by PGR ampUfication of 
plasmid pTFZ-KOX (described in co-owned WO 01/53480) with primers ASl and AS2. 

15 The sequences of these primers are as follows: 

ASl: CGATGGATCCATGGGAGAGAAGGCGCTGC (SEQ IDNO:2126) 
AS2: GCGTAAAGCTTACGCATAATCCGGCACATCATACGGATAAGAG 

CCGCGGCGGTGGTTGTGTGTTAAATGGATTT (SEQ ID NO:2127) 
The PGR product was gel purified and digested with BamHI and HindHI, then 
20 repurified and cloned into BamH IMind ffl-digested pET23a vector (Novagen), yielding 
pET23aA'FZ-HA. A number of clones were picked and sequenced to verify the 
correctness of the inserts. 

(b) A firagment of approximately 1 .2 kb is amplified firom the vector 
25 pET23anFZ-HA, using the primers ScaRev and GSFwd (Table 10). This firagment 
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contains the HA-epitope tag sequence (YPYDVPDYA* (SEQ ID NO: 2122)) and part of 
the GGGS (SEQ ID NO:1988) linker sequence at the 5' end. Additionally, the GSFwd 
primer adds a BamHI site at the extreme 5' end. The ScaRev primer does not contain a 
restriction site, but a Seal site from the vector is present approximately 40 bp downstream 
of the primer binding site. This fragment is cut with BamHI and Seal and inserted into 
similarly cut pET23a. 

(c) Zif26S finger 1 is then amplified using the PGR primers ZiflFwd and ZiflRev 
(Table 10), which add a Bglll site at the 5' end and both Agel and BaniHl sites at the 3 ' 
end. This construct is then cut with BgUL and BamHI and inserted into the vector 
construct made in step (b), which has been linearised with BamHI. At this stage the new 
construct, termed pET23aZiflHA is sequenced to find correctly oriented zinc finger 
inserts. 

(d) OHgonucleotides encoding zinc finger modules for the C-terminus of the 3- 
finger constructs (cassette C) are amplified using the primers CasCxFor and CasCxRev 
(where x is 1 to 8, see Table 10). These cassettes are then digested with the restriction 
enzyme BamHI, and inserted into BamHI cut, dephosphorylated pET23aZiflHA. At this 
stage the new vector construct is not recircularised, 

(e) Oligonucleotides encoding zinc finger modules for cassette B are amplified 
using primers CasBxFor and CasBxRev (where x is 9 to 16, see Table 10). These 
fragments are cut with the enzymes Xmal and Agel, at 37 for 1-2 hours. The linear 
vector produced in stage (d) above, is also cut with Agel and Xmal (as described), and 
dephosphorylated. Digested cassette B fragments are ligated into Agel, Xmal cut vector, 
in the presence of the restriction enzymes Agel and Xmal at room temperature for 16 
hours. Dimng tliis incubation incorrectly ligated fragments are re-digested and re-ligated 
repeatedly, until the majority (or all) of the inserts are in the desired orientation. Correct 
3-finger constructs have the assembly depicted in Figure 6. 

(f) Finally, 3-finger constructs are ampUfied from the ligated vector (produced in 
step (e)) using tlie primers pETFwdl (Table 5) and pETRevl (Table 10). 1 \il of each 
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Ugation mixture is amplified in a 10 ^il (total volume) PGR reaction for 30 cycles. 
Alternatively, the ligated vector can be transformed into bacteria to produce samples 
containing single zinc Gnger clones. 

5 The above procedure results in the majority of PGR products being the correct 3-finger 
constructs, so that any incorrect fragments will not significantly affect the selection 
protocol, and the PGR products can be used for screening witliout further processing. 
Alternatively, 3-fmger PGR products may be purified from an agarose gel before use. 

10 b. Screening of the Library Against 5'-GCG-TGG-GCG-3' 

Members of the zinc finger library can be screened against the desired target site from a 
mixed population of clones, or fix>m individual clones as described in Example 4, 
Protocol A or Protocol B (above), respectively. The target site for the screen is produced 

15 by annealing the oligonucleotides Zif.b site (AS154) and Zif site RC (AS155), as before. 
Template for protein expression is in each case made by PGR using primers pETFwdl 
(Table 5) and pETRevl (Table 10). 1 nl of each PGR reaction is used to express protein 
and screen for binding to the Zif site in the manner described in Example 4. The DNA 
corresponding to the samples giving the highest fluorescence signals is coUected, purified 

20 from a 1 % T AE-agarose gel, and sequenced to determine the sequence of the optimal 
binding 3-finger peptide. 

Example 6: Reduced Human Zinc Finger Module Library for Universal DNA 
Recognition. 

25 

A library system similar to tiiat described in Example 5 can be constructed using zinc 
finger modules from databases such as those in Examples 1, 2 and 3 to select 2-finger 
units which bind any 2-finger (6 bp) recognition sequence. There are only 4096 (=4^ 
unique 6 bp sequences, tiierefore, a 2-finger library of natural zinc fingers (from specific 
30 animals, plants or ftmgi) can easily be constinicted with enough variability to provide a 
specific 2-finger combination for optimal binding to any 6 bp target site. Again, to 
reduce the number of natural zinc finger modules that have to be constructed, a small 
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selection of natural zinc finger modules ie.g., 3) are chosen for each 3 bp binding 
sequence (according to their predicted or determined recognition sequence). There are 64 
(=4^) possible 3 bp bmding sequences so in the first instance less than 200 (i.e. 192) 
natural zinc finger modules are constructed. These 200 zinc finger modules can be in 
5 either of 2 possible positions in the 2-finger construct, which gives approximately 40,000 
(=200^) combinations of fingers to bind the 4096 possible 6 bp target sites. As in 
Example 5, these 2-finger imits are attached to Zif268 finger 1 which acts as an anchor 
for DNA recognition. 

10 a. Library Construction 

The selected zinc finger modules are reverse translated from their amino acid sequences 
and synthesised as oligonucleotides. Double stranded zinc finger cassettes for bothN- 
terminal and C-terminal fingers are created by PGR using primers specific for the relevant 

1 5 zinc finger module. Each zinc finger module is amplified in 2 separate reactions, as 

described in Example 5. The first PGR reaction uses primers which add TGEKP (SEQ 
ID NO:3) linker peptides and Agel and Xmal restriction sites, to the 3' and 5' ends, 
respectively, to generate cassette B fragments. The second PGR reaction generates 
cassette C fragments by adding a TGEKP (SEQ ID NO:3) linker and an Xmal site at the 

20 5' end (this primer is the same as that used in cassette B production), and a sequence 

encoding the sequence LRQKDGGGS (SEQ ID NO:2125) and a BajnBI restriction site at 
the 3' end. The final constmcts are similar to that represented in Figure 6. 

b. Library Selection 

25 

The collection of 3-finger zinc finger peptides produced above can be used to obtain 
specific domains for binding desired target sequences. Two exemplary approaches are 
described below. 

30 i), Non-Cloning Selections. 
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A library constructed as described herein can be used to select optimal zinc finger 
domains for binding to any specified binding site. For instance, to select a peptide which 
binds the sequence 5'-GGA-TAA-3', the binding site formed by anneaUng the 
oligonucleotides #l#5#6.b and #1#5#6 RC (Table 6, above), can be used as a target site 

5 (5 '-GGA-TAA-GCG-3 ')• Selection of a zinc finger domain to bind such a target can be 
conducted, for example, in the manner described in Example 4. Briefly, the zinc finger 
library is diluted into 100 or more sub-libraries, which are screened as described above. 
The most active sub-libraries collected are further diluted to create much smaller sub- 
libraries, which are screened again, and so on. Following such a protocol, a library of 

10 40,000 members can be fiilly screened and a high-affinity binder selected in just 3 rounds. 

This selection procedure provides an extremely rapid method to select zinc finger 
peptides to bind any desired target site. The procedure also has the advantages of 
eliminating the need for cloning (as is required for methods such as phage display, see 
1 5 below), and is not limited by Ubrary size. 

u). Phage Library Selections 

Zinc finger polypeptide phage display Ubraries are made and used to select clones 
20 encoding peptides that bind the desired nucleotide sequence, as described in co-owned 
WO 98/53057. An exemplary phage display library contains peptides which bind target 
sites with the sequence 5'-XXX-XXX-GCG-3', where X can be any nucleotide. Hence, 
libraries of phage can be selected usmg the same target sites as described above. The 
selection protocol for zinc fingers displayed on phage is briefly described below. 

25 

Protocol 

The selection protocol is adapted firom that described in co-owned international patent 
appUcation WO98/53057. 

30 
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The 3-finger constructs of the present Example are PGR amplified using universal 
forward and reverse primers which contain sites for Notl and Sfil respectively (called 
NatPhageF and NatPhageR, respectively). 

5 NatPhageF: GCAACTGC GQCCCAGCCGGCCA TGGCAQAGGAACGCCCGTATG (SEQ ID 
NO:2128) 

NatPhageR: GAGTCATTCTGCGGCCGCGTCCTTCTGGCGCAGGTG (SEQ ID NO:2129) 

Backward PGR primers in addition introduce Met-Ala-Glu as the first three amino acid 
10 residues of the zinc finger polypeptides, and these are followed by the residues of the 
wild type or library zinc finger polypeptides as required. Cloning overhangs are 
produced by digestion with i^^I and Notl where necessary. Nucleic acid encoding zinc 
finger polypeptide fragments is ligated into similarly prepared Fd-Tet-SN vector. This is 
a derivative of fd-tet-DOGl (Hoogenboom et al (1991) Nucl Acids Res, 19:4133-4137), 
15 in which a section of the peffi leader and a restriction site for the enzyme Sfii (underlined) 
have been added by site-directed mutagenesis using the oligonucleotide: 

5 ■ CTCCTGCAGTTGGACCTGTGCCAT GGCCGGCTGGGCC GCATA 
GAATGGAACAACTAAAGC 3 • (SEQ ID NO:2130) 

20 that anneals in the region of the polylinker. Electrocompetent DH5a cells are 

transformed with recombinant vector in 200 ng aliquots, grown for 1 hom* in 2xTY 
medium witli 1% glucose, and plated on TYE containing 15 p-g/ml tetracycline and 1% 
glucose. 

25 To generate phage for selections, tetracycline resistant colonies are transferred from 

plates into 2xTY medium (16g/litre Bacto tryptone, lOg/litre Bacto yeast extract, 5g/litre 
NaCl) containing 50jiM ZnCl2 and 15 |J.g/ml tetracycline, and cultured overnight at 30°C 
in a shaking incubator. Cleared culture supematant containing phage particles is obtained 
by centrifiiging at 300 xg for 5 minutes. 

30 
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Double stranded binding sites for use in selections are generated by annealing 
complementary oligonucleotides, one of which is biotinylated. 

Biotinylated DNA target sites (1 pmol) are bound to streptavidin-coated wells (Roche). 
Phage supernatant solutions are diluted 1:10 in PBS selection buffer (PBS containing 50 
pM ZnCla, 2% Marvel, 1% Tween, 20 |xg/ml sonicated salmon sperm DNA, and 10-fold 
excess of competitor DNA), and 200 pi is applied to each well for 1 hour at 20°C. After 
this time, the wells are emptied and washed 18 times with PBS containing SOiiM ZnCh 
and 1% Tween and 2 times in PBS containing 50|jM ZnCh- Retained phage are eluted in 
100 \i\ O.IM triethylamine and neutralised with an equal volume of IM Tris (pH 7.4). 
Logarithmic-phase E. coli JM109 (100 are infected with eluted phage (100 and 
used to prepare phage supematants for subsequent rounds of selection. After 4 rounds of 
selection, a 'pool' or 'mini-population' of phage is obtained, which bind the specified 
target sequence. These pools of phage can be stored at -70°C for later use. Additionally, 
E. coli infected with these phage pools can be plated to obtain individual clones, which 
can be tested by ELISA for binding affinity and specificity to obtam the 'best' clone (see 
Example 9, Quality Control). 



Example 7: Complete Human Zinc Finger Module Library for Universal DNA 
Recognition. 

An complete, or nearly complete, library containing all zinc finger sequences which bind 
a particular target site can be constructed usmg zinc finger modules to select 2-finger (or 
3-fmger) units which bind any 6 bp (or 9 bp) recognition sequence. Two exemplary 
methods for construction of such a library are described. 

a. Oligonucleotide-Based Library Construction. 

All zinc finger modules may be synthesised as a single stranded ohgonucleotide, as 
described in Example 4. Zinc finger modules are made double stranded and TGEICP 
(SEQ ID NO:3) linkCTS added by PGR with 5' and 3' primers specific for each individual 
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zinc finger module, to make cassettes. These cassettes can then be recombined, as 
described in Example 5, to make random or deliberate combinations of zinc finger 
modules comprising 2, 3, or more linked fingers. 

b. PCR-Based Library Construction. 

Zinc fingers proteins (especially of the CysaHisi family) fomi the second most abundant 
family of proteins in the human genome. Furthemiore, in nature, zinc finger modules are 
often linlced by the canonical linker peptide TGEKP (SEQ ID NO: 3), Avhich begins 
immediately afl:er the second zinc-coordinating histidine residue. Therefore, the peptide 
sequence HTGEKP (SEQ ID NO:2131) is commonly foimd between natural zinc finger 
modules. Because of this consensus sequence, it has been possible to clone natural zinc 
finger modules firom the human genome (Becker, K.G., Nagel, J.W., Canning, R.D., 
Biddison, W.E,, Ozato, K. & Drew, P.D. (1995) Hum. Mol. GeneL 4: 685-691; Bray, P., 
Lichter, P., Thiesen, H.-J., Ward, D.C. & Dawid, I.B. (1991) Proc, Natl Acad, ScL USA 
88: 9563-9567), and the Arabidopsis genome (Meissner, R. & Michael, A. J. (1997) Plant 
Mol Biol 33: 615-624), using redundant primers for PGR. See also Pellegrino et aL 
(1991) Proc. Natl Acad, ScL USA 88:671-675. It is preferable to use genomic DNA or a 
genomic DNA (gDNA) library, rather tban a cDNA library, because transcription factors, 
such as zinc finger proteins, are strongly regulated during the cell cycle, development and 
in response to extracellular signals. Hence, a cDNA library will probably not contain the 
majority of zinc finger proteins, and will be biased towards highly expressed proteins. 

A suitable protocol for the PCR-extraction of zinc finger modules firom hiraian genomic 
DNA follows: 

Genomic DNA is purified directly from human cells, or provided by a gDNA library. 
gDNA Ubraries are preferable as they are commercially available (for example from 
Clontech, ATCC, Stratagene etc) and can be easily manipulated. PGR to extract zinc 
finger modules can be conducted directly on purified gDNA, or the gDNA library can be 
screened for zinc fingers containing the HTGEKP (SEQ ID NO:2131) motif before 
carrying out PGR To screen the gDNA library, any method known to one of skill in the 
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art, e.g. colony hybridisation, can be used. Phage containing gDNA inserts are plated 
onto Escherichia coli XL-1 Blue bacterial lawns. At least 10^ phage plaques are 
transferred to replica filters and screened with, for example, a 27-mer ^^p-radiolabelled 
degenerate oligonucleotide, which anneals to the conserved linker region of zinc finger 
5 proteins and adjacent sequences. The sequence of a suitable degenerate probe (SEQ ID 
NO:2132), and the amino acid sequence (SEQ ID NO:2133) to which it corresponds is 
shoAvn below. 

c^'/t'^/g ^/c/g ca'^/t ac'^/g gg^/g ga°/a aa°/a cc<=/t t^///t 

10 R/L I/T/M H T G E K P Y/F 



Hybridisation is performed, e.g., for 16 hours at 42-50 "C, following which filters are 
washed 3-5 times, to remove non-specifically bound probe, in 0.2x standard saline citrate 
(SSC)/0.1% SDS. Filters are then subjected to autoradiography or pho^horhnaging to 
1 5 determine positive plaques. 

Positive plaques are picked into log-phase E. coli XL-1 Blue bacterial cultures and the 
phage are hai-vested for PGR. 1 ^il phage supernatant is added to 49 ^ll PGR pre-mix, 
containing the oUgonucleotide primeis TGEKPfor (SEQ ID NO:2134) and TGEKPrev 
20 (SEQ ID NO:2135) (shown below, annealing sequence in bold), and zinc finger modules 
are ampUfied by 30 cycles of PGR. TGEKPfor (SEQ ID NO:2134) and TGEKPrev (SEQ 
ID NO:2135) also contain JH>aI and jBcoRI restriction sites (underUned), respectively. 
PGR products are separated on 1.5% TAE-agarose gels and firagments of approximately 
120 bp (corresponding to 1 zinc finger module plus flanking sequences) are purified, as 
25 described in Example 4. Additionally, firagments of approximately 220 bp, corresponding 
to natural 2-finger units, can also be collected and used. Such products can be digested 
with Xbal and EcdBl and cloned into a vector that has been digested so as to generate 
compatible ends, such as, for example, pcDNA3.1(-) (Invitrogen) digested with Ecom 
and Xbal.. Such a vector pool can then be used as a source for natural 1- or 2-zinc finger 
30 modules, from which to construct 2- or 3-zinc finger peptides for selections as described 
above. Zinc finger modules for cassette B can be amplified from such vectors using the 
universal primers TGEKPXma (SEQ ID NO:2136) and TGEKPAge (SEQ ID NO:2137), 



i 
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which aiuieal to the conserved TGEKP (SEQ ID NO:3) hnker regions and add restriction 
sites for tlie enzymes Xmal at the 5' terminus and Agel at the 3' terminus, respectively 
(restriction sites underUned). Cassette C units can be ampUfied using the primer 
TGEKPXma (SEQ ID NO:2136) and TGEKPend (SEQ ID NO:2I38), which adds a 3' 
5 TRQKDGGGS (SEQ ID NO:2139) sequence incorporating a BamRl site (underlined, see 
below). Two- and 3-finger constmcts can then be constructed and screened as described 
in the Examples above. 



TGEKPfor: TTAGTCTAGA^/gCA^/tAC^/gGG^/gGA^/aAA^/aCC (SEQ ID 
10 NO:2134) 

TGEKPrev: TAC TGAATTC^ /^GG^/tTT^/tTC^/cCC^/cGT^/aTG (SEQ ID 
NO:2135) 

TGEKPXma: TCTAGA^/qCA^/tCCCGGGGA^/aAA^/aCC (SEQ ID NO:2136) 

TGEKP Age: GAATTC^/ a GG^/tTT^/tTC ACCGGT% TG (SEQ ID NO:21 37) 

1 5 TGEKPend: AGTGTGGTGGAATTC^/aGGGGATCCGCCGCCGTC%TT 
^/tTG^/cCG^/cGT^/aTG (SEQ ID NO:2138) 



Example 8, Microarray Analysis. 

20 

Microarray analysis can also be used to detemiine the binding site specificity of 2- and 3- 
finger peptides. For example, a 3 -zinc finger library, with finger 1 fixed as Zif268 fuiger 
one recognises the sequence 5'-XXX-XXX-GCG-3', where X is any specified nucleotide. 
Hence, there are 4096 (=4^) unique binding sites for such a library. All 4096 of these 

25 sites can be arrayed onto a single glass slide, allowing a specified 2-finger peptide to be 
screened against every possible binding site at once. A suitable protocol for such an 
experiment is described in Martha L. Bulyk, Xiaohua Huang, Yen Choo, & George 
M. Church (Proc. Natl Acad, Set USA: Vol. 98, No. 13, 7158-7163, June 19, 2001) 
which is incorporated, by reference, in its entirety. See also co-owned WO 01/25417, the 

30 disclosure of which is hereby incorporated by reference in its entirety. 
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The amount of binding to each target sequence can be visuaUsed and quantified using 
simple fluorescence measurements. For example, the zinc finger peptide can be 
expressed in vitro, or on the surface of phage. Isolated zinc finger peptides may contain 
an epitope tag for labelUng purposes, whereas bound phage can be detected using a 
primary antibody against a phage coat protein, such as gVEI. A secondary antibody, such 
as one conjugated to R-phycoerythrin may be used to provide a visible signal when a 
suitable substrate is applied. 



Example 9. Quality Control. 

Particular 2- or 3-finger peptides can be screened to determine their specificity or afBnity, 
as desired. 
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a. Phage ELISA Assay 

Phage supematants from Round 4 of selection (Example 6, supra) are used to infect 
E, coli JM109 bacteria, and grown to prepare fresh supematants for zinc finger phage 
5 ELISA, using standard procedures as described previously (Choo, Y. & Klug, A. (1994) 
Proc. Natl Acad. Sci. USA 91, 11163-11167; Choo, Y. & Klug, A. (1994) Proc. Natl 
Acad ScL USA 91, 11168-11172). Briefly, 5'-biotinylated, positionally randomised 
oligonucleotide libraries, containing Zif268 binding site variants, are synthesised by 
anneaUng complimentary oligonucleotides as described supra. DNA libraries are added 

10 to streptavidin-coated ELISA wells (Boehringer-Maimheim) in PBS containing 50jj,M 
ZnCl2 (PBS/Zn). Phage solution (overnight bacterial culture supernatant diluted 1:10 in 
PBS/Zn containing 2% Marvel, 1% Tween and 20^g/ml sonicated salmon sperm DNA) is 
applied to each well (50|Lil/well). Binding is allowed to proceed for one hour at 20^C. 
Unbound phage are removed by washing 7 times witli PBS/Zn containing 1% Tween, 

1 5 then 3 times with PBS/Zn. Bound phage are detected by ELISA using horseradish 

peroxidase-conjugated anti-M13 IgG (Pharmacia Biotech) and the colourimetric signal is 
quantitated using SOFTMAX 2,32 (Molecular Devices). 

For rapid validation, the entire population of phage from Round 4 selection can be 
20 assayed in two ELISA wells: one containing the target DNA binding site, and one 

containing a control DNA binding site with between 1 and 5 base changes from the target 
sequence. A selection is deemed to be successful if the ELISA signal (representing DNA 
binding) is higher in the target well than in the control well, 

25 The higher the signal measured above, the greater the popidation of specific binding 

clones. However, individual low values for such a procedure do not necessarily indicate 
a failure of the selection, as there may be individual high affinity / specificity clones 
within the round 4 phage population that may be masked by other non-specific clones. 
Nevertheless, this assay provides a quick profile of the overall quality of selection. 



30 
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For a more detailed validation, individual phage clones are recovered from Round 4 by 
plating out infected bacterial colonies on agar. Fresh phage supematants are prepared 
from these colonies and assayed by ELISA, as described above. 

5 Finally, the coding sequence of individual zinc finger clones can be amplified by PGR 
using external pruuers complementary to phage sequence, and the PGR products are then 
sequenced to determine the amino acid sequence of the selected zinc fingers. 

As an alternative, individual 3-finger peptides can be analysed by gel-shift assays or by 
10 microairay screening, as described infra. See also WO 00/41566, WO 00/42219 and 
WO 01/25417. 

b. Gel-Shift Assay 

Peptides are assayed using '^P end-labelled synthetic oligonucleotide duplexes containing 
15 the appropriate binding site sequences. 

DNA binding reactions contain the appropriate zinc-finger peptide, binding site and 1 ng 
competitor DNA {e.g., poly dl-dC or sahnon sperm DNA) in a total volume of 10 |j,1, 
which contains: 20 mM Bis-tris propane (pH 7.0), 100 mM NaCl, 5 mM MgCl^, 50 
ZnCl2, 5 mM DTT, 0.1 mg/ml BSA, 0.1% Nonidet P40. Incubations are performed at 
20 room temperature for 1 hour. 

To determine the concentration of zinc finger peptide produced in the in vitro expression 
system, crude protein samples are used in gel-shift assays against a dilution series of the 
appropriate binding site. Bmding site concentration is always well above the Kd of the 
peptide, but ranged from a higher concentration than the peptide (SO mM), at which all 
25 available peptide binds DNA, to a lower concentration (3-5 mM), at which all DNA is 

bound. Controls are carried out to ensure that binding sites are not shifted {i.e., bound) m 
the absence of zinc finger peptide. The reaction mixtures are then separated on a 7% 
native polyacrylamide gel. Radioactive signals are quantitated by Phosphorlmager 
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analysis to determine the amount of shifted binding site, and hence, the concentration of 
active zinc finger peptide. 

Dissociation constants (K<j) are determined in parallel to the calculation of active peptide 
concentration. For detemiination of K^, serial 3, 4 or 5-fold dilutions of crude peptide are 
5 made and incubated with radiolabelled binding site (10 pM - 1 0 nM depending on the 
peptide), as above. Samples are rmi on 7% native polyacrylamide gels and the 
radioactive signals quantitated by Phosphorlmager analysis. The data is then analysed 
according to linear transformation of the binding equation and plotted in CA-Cricket 
Graph III (Computer Associates Inc. NY) to generate the apparent dissociation constants. 
10 The Kd values reported are the average of at least two separate detemiinations. 

c. Microarray Assay 

Selected zinc finger domains can also be assayed for binding site specificity using the 
15 microarray analysis outlined in Example 8. 

All publications mentioned in the above specification are herein incorporated by 
reference. Various modifications and variations of the described methods and system of 
the invention will be apparent to those skilled in the art without departing fi*om the scope 

20 and spirit of the invention. Although the invention has been described in comiection with 
specific preferred embodiments, it should be understood that the invention as claimed 
should not be unduly limited to such specific embodiments. Indeed, various 
modifications of the described modes for carrying out the invention which are apparent to 
those skilled in molecular biology or related fields are intended to be within the scope of 

25 the following claims. 
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CLAIMS 

1 . A composite binding polypeptide comprising a first natural binding domain 
derived from a first natural binding polypeptide, and a second natural binding domain 
derived from a second natural binding polypeptide, wherein said first and second natural 
binding polypeptides may be the same or different; which polypeptide binds to a target, 
said target differing from the natural target of the both the fnst and the second binding 
polj'peptides. 

2. A composite polypeptide according to claim 1 , wherein said first and second 
natural binding polypqptides are different polypeptides. 

3 . A composite polypeptide according to claim 1 or claim 2, comprising three or 
more natural binding domains. 

4. ■ A composite polypeptide according to any preceding claim, wherein the binding 
domains are nucleic acid binding domains. 

5 . A composite polypeptide according to claim 4, which is a nucleic acid binding 
polypq)tide. 

6. A composite polypeptide according to claim 4 or claim 5 which is a zinc finger 
polypeptide, and the natural binding domains are zinc finger domains. 

7. A composite polypeptide according to claim 6, which comprises a Cys2-His2 zinc 
finger binding domain. 

8. A composite polypeptide according to claim 6 or claim 7, which comprises a 
Cys3-ffis zinc finger binding domain. 

9 . A composite polypeptide according to any preceding claim, which comprises 6 or 
more natural binding domains. 



} 
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10. A composite polypeptide according to claim 9, wherein 6 natural binding domains 
are arranged in a 3x2 conformation, separated by linker sequences. 

11. A chimeric polypeptide comprising: 

(a) a binding polypeptide according to any preceding claim, and 

(b) a biological effector domain. 

11. A Ubrary of natural binding domains. 

12. A library according to claim 1 1 , comprising a plurality of natural binding domains 
from which a polypeptide according to any one of claims 1 to 10 can be assembled. 

13. A library of natural zinc fmger nucleic acid binding domains, wherein said zinc 
finger domains comprise a linker attached thereto. 

14. A library according to claim 13, wherein the linker comprises the sequence 
TGEKP. 

15. A method for selecting a binding polypeptide capable of binding to a target site, 
comprishig: 

(a) providing a library of natural binding domains; 

(b) assembling two or more of said domains to form a composite polypeptide; 

(c) screening said composite polypeptide against the target site in order to 
determine its ability to bind the target site. 

16. A method according to claim 15, wherein the natural binding domains are zinc 
finger binding domains. 

17. A method according to claim 15 or claim 16, wherein two or more composite 
polypeptides comprising two or more domains which are selected for binding to two or 
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more target sites are combined to provide a composite polypeptide which binds to an 
aggregate binding site comprising the two or more target binding sites. 

18. A method for designing a composite binding polypeptide, comprising: 

(a) providing information defining a target site; 

(b) selecting, from a database of natural binding domains, sequences of binding 
domains which are predicted to bind to the target site by the appUcation of one or more 
rules which define target binding interactions for the binding domains; and 

(c) displaying the sequences of the binding domains, separated by linker 
sequences, and optionally assembling the binding polypeptide from a library of said 
domains. 

19. A method according to claim 18, wherein the binding domains are zinc finger 
domains. 

20. A method according to claim 19, wherein the zinc fingers are considered to bind 
to a nucleic acid triplet and domains are selected according to one or more of the 
following rules: 

(a) if the 5' base in the triplet is G, then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position ++2 is Asp; 

(b) if the 5' base in the triplet is A, then position +6 in the a-helix is Gin and ++2 

is not Asp; 

(c) if the 5' base in the triplet is T, then position +6 in the a-heUx is Ser or Thr 

and position ++2 is Asp; 

(d) if the 5' base in the triplet is C, then position +6 in the a-helix may be any 
amino acid, provided that position ++2 m the a-heUx is not Asp; 

(e) if the central base in the triplet is G, then position +3 m the a-helix is His; 

(f) if the central base in the triplet is A, then position +3 in the a-helix is Asn; 

(g) if the central base in the triplet is T, then position +3 in the a-helix is Ala, Ser 
or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 
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(h) if the central base in the triplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Vai; 

(i) if the 3' base in the triplet is G, then position -1 in the a-helix is Arg; 
(j) if the 3' base in the triplet is A, then position -1 in the a-helix is Gin; 

(k) if the 3' base in the triplet is T, then position -1 in the a-helix is Asn or Gin; 
(1) if the 3' base in the triplet is C, then position -1 in the a-helix is Asp. 

21. A method according to claim 19, wherein the zinc fingers are considered to bind 
to a nucleic acid quadruplet and domains are selected according to one or more of the 
following rules: 

(a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg or Lys; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Glu, Asn or 

Val; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser, Thr, Val 

or Lys; 

(d) if base 4 in the quadmplet is C, then position +6 in the a-helix is Ser, Thr, Val, 
Ala, Glu or Asn; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Gin; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is His or Thr; 

(1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp or His; 

(m) if base 1 in the quadruplet is G, then position +2 is Glu; 

(n) if base 1 in the quadruplet is A, then position +2 Arg or Gin; 

(o) if base 1 in the quadruplet is C, tifien position +2 is Asn, Gin, Arg, His or Lys; 

(p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 
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22. A method according to claim 19, wherein the zinc fingers are considered to bind 
to a nucleic acid quadruplet and domains are selected according to one or more of the 
following rules: 

(a) if base 4 m the quadruplet is G, then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position ++2 is Asp; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Ghi and ++2 

is not Asp; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

(d) if base 4 in the quadruplet is C, then position +6 m the a-heUx may be any 
amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-heUx is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, tiien one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Ghi; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is Asn or Ghi; 
(1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp; 
(m) if base 1 in the quadruplet is G, then position +2 is Asp; 
(n) if base 1 in the quadruplet is A, then position +2 is not Asp; 
(o) if base 1 in the quadruplet is C, then position +2 is not Asp; 
(p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 

23. The method of any of claims 18-22, fiirther comprising the step of synthesizing a 
polynucleotide encoding the binding polypeptide. 
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24. A computer-implemented method for designing a zinc finger polypeptide, 
comprising the steps of: 

(a) providing a system comprising at least storage means for storing data relating 
to a library of zinc fingers; storage means for storing a rule table; means for inputting 
target nucleic acid sequence data; processing means for generating a result; and means for 
outputting the result; 

(b) inputting sequence data for a target nucleic acid molecule; 

(c) defining a first target zinc finger binding site in said nucleic acid molecule; 

(d) interrogating the zinc finger Ubrary and rule table storage means, comparing 
zinc fingers to the target zinc finger binding site according to the rule table and selecting 
zinc finger data identifying a zinc finger capable of binding to said target site; 

(e) defining at least one further target zinc finger binding site and repeating step 
(d); and 

(f) outputting the selected zinc finger data. 

25. A method according to claim 24, further comprising sending instructions to an 
automated chemical synthesis system to assemble a zinc finger polypeptide as defined by 
the zinc finger data obtained in (f). 

26. A method according to claim 25, wherein the zinc finger polypeptide is tested for 
binding to the target site, and data firom said testing is used to select, fi"om a plurality of 
candidates, a zinc finger polypeptide capable of binding to the target site. 

27. A method according to any one of claims 24 to 26, wherein two or more zinc 
finger pol3T5ep tides are combined to form a zinc finger polypeptide capable of binding to 
an aggregate binding site comprising two or more target sites. 

27. A method according to claim 24, wherein the mle table comprises rules as set 
forth in any one of claims 21 to 23. 
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